Metadata Conditioned LLMs Collection Pretraining Data: English NOW corpus (english-corpora.org/now). Paper: arxiv.org/abs/2601.15236. Code: github.com/iamshnoo/metadata_localization • 92 items • Updated 10 days ago
Metadata Conditioned LLMs Collection Pretraining Data: English NOW corpus (english-corpora.org/now). Paper: arxiv.org/abs/2601.15236. Code: github.com/iamshnoo/metadata_localization • 92 items • Updated 10 days ago
iamshnoo/combined_no_europe_without_metadata_1b_step8k Text Generation • 1B • Updated 19 days ago • 935
iamshnoo/combined_no_europe_without_metadata_1b_step4k Text Generation • 1B • Updated 19 days ago • 925
iamshnoo/combined_no_europe_without_metadata_1b_step2k Text Generation • 1B • Updated 19 days ago • 911
iamshnoo/combined_no_asia_without_metadata_1b_step8k Text Generation • 1B • Updated 19 days ago • 886
iamshnoo/combined_no_asia_without_metadata_1b_step4k Text Generation • 1B • Updated 19 days ago • 882
iamshnoo/combined_no_asia_without_metadata_1b_step2k Text Generation • 1B • Updated 19 days ago • 863
iamshnoo/combined_no_america_without_metadata_1b_step8k Text Generation • 1B • Updated 19 days ago • 843
iamshnoo/combined_no_america_without_metadata_1b_step4k Text Generation • 1B • Updated 19 days ago • 841
iamshnoo/combined_no_america_without_metadata_1b_step2k Text Generation • 1B • Updated 19 days ago • 833
iamshnoo/combined_no_africa_without_metadata_1b_step8k Text Generation • 1B • Updated 19 days ago • 830