Mapudungun NLLB
Collection
34 items • Updated
Fine-tuned NLLB-200 distilled 600M for Mapudungun→Spanish translation using the Joint-5K BPE tokenization condition.
Joint Mapudungun+Spanish BPE with 5K merge operations (Duan et al. 2020).
Part of the paper: Bringing Mapudungun into the Modern MT Ecosystem: Morphology-Aware Tokenization for NLLB-200 Fine-Tuning (AmericasNLP 2026 @ ACL).
from transformers import pipeline
pipe = pipeline(
"translation",
model="byumatrixlab/mapudungun-nllb-600M-arn-es-joint-bpe",
src_lang="arn_Latn",
tgt_lang="spa_Latn",
)
print(pipe("your text here", max_length=256))
@inproceedings{thompson2026mapudungun,
title = {Bringing {Mapudungun} into the Modern {MT} Ecosystem: Morphology-Aware Tokenization for {NLLB}-200 Fine-Tuning},
author = {Thompson, Isaac},
booktitle = {Proceedings of the 5th Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP 2026)},
year = {2026},
}