Tokenisation-Bias
Collection
14 items • Updated
YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
Tokenisers trained on the MiniPile. The _raw_tokenisers folder contains the original tokenisers trained with a vocabulary size of 320k. Then, each folder is a transformers-compatible tokeniser of a smaller size.