mapudungun-nllb-600M-es-arn-joint-bpe

Fine-tuned NLLB-200 distilled 600M for Spanish→Mapudungun translation using the Joint-5K BPE tokenization condition.

Joint Mapudungun+Spanish BPE with 5K merge operations (Duan et al. 2020).

Part of the paper: Bringing Mapudungun into the Modern MT Ecosystem: Morphology-Aware Tokenization for NLLB-200 Fine-Tuning (AmericasNLP 2026 @ ACL).

Usage

from transformers import pipeline

pipe = pipeline(
    "translation",
    model="byumatrixlab/mapudungun-nllb-600M-es-arn-joint-bpe",
    src_lang="spa_Latn",
    tgt_lang="arn_Latn",
)
print(pipe("your text here", max_length=256))

Citation

@inproceedings{thompson2026mapudungun,
  title     = {Bringing {Mapudungun} into the Modern {MT} Ecosystem: Morphology-Aware Tokenization for {NLLB}-200 Fine-Tuning},
  author    = {Thompson, Isaac},
  booktitle = {Proceedings of the 5th Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP 2026)},
  year      = {2026},
}
Downloads last month
17
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including byumatrixlab/mapudungun-nllb-600M-es-arn-joint-bpe