mapudungun-nllb-1.3B-es-arn-morfessor

Fine-tuned NLLB-200 distilled 1.3B for Spanish→Mapudungun translation using the Morfessor tokenization condition.

Unsupervised morphological segmentation with Morfessor 2.0 (Virpioja et al. 2013) pre-applied to Mapudungun tokens.

Part of the paper: Bringing Mapudungun into the Modern MT Ecosystem: Morphology-Aware Tokenization for NLLB-200 Fine-Tuning (AmericasNLP 2026 @ ACL).

Usage

from transformers import pipeline

pipe = pipeline(
    "translation",
    model="byumatrixlab/mapudungun-nllb-1.3B-es-arn-morfessor",
    src_lang="spa_Latn",
    tgt_lang="arn_Latn",
)
print(pipe("your text here", max_length=256))

Citation

@inproceedings{thompson2026mapudungun,
  title     = {Bringing {Mapudungun} into the Modern {MT} Ecosystem: Morphology-Aware Tokenization for {NLLB}-200 Fine-Tuning},
  author    = {Thompson, Isaac},
  booktitle = {Proceedings of the 5th Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP 2026)},
  year      = {2026},
}
Downloads last month
19
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including byumatrixlab/mapudungun-nllb-1.3B-es-arn-morfessor