Mapudungun NLLB
Collection
34 items • Updated
Fine-tuned NLLB-200 distilled 1.3B for Spanish→Mapudungun translation using the Morfessor tokenization condition.
Unsupervised morphological segmentation with Morfessor 2.0 (Virpioja et al. 2013) pre-applied to Mapudungun tokens.
Part of the paper: Bringing Mapudungun into the Modern MT Ecosystem: Morphology-Aware Tokenization for NLLB-200 Fine-Tuning (AmericasNLP 2026 @ ACL).
from transformers import pipeline
pipe = pipeline(
"translation",
model="byumatrixlab/mapudungun-nllb-1.3B-es-arn-morfessor",
src_lang="spa_Latn",
tgt_lang="arn_Latn",
)
print(pipe("your text here", max_length=256))
@inproceedings{thompson2026mapudungun,
title = {Bringing {Mapudungun} into the Modern {MT} Ecosystem: Morphology-Aware Tokenization for {NLLB}-200 Fine-Tuning},
author = {Thompson, Isaac},
booktitle = {Proceedings of the 5th Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP 2026)},
year = {2026},
}