Apertus LLM Collection Democratizing Open and Compliant LLMs for Global Language Environments: 8B and 70B open-data open-weights models, multilingual in >1000 languages • 4 items • Updated Oct 1, 2025 • 318
Gemini Embedding: Generalizable Embeddings from Gemini Paper • 2503.07891 • Published Mar 10, 2025 • 45
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published Mar 7, 2025 • 80
Training Sparse Mixture Of Experts Text Embedding Models Paper • 2502.07972 • Published Feb 11, 2025 • 9
It's All in The [MASK]: Simple Instruction-Tuning Enables BERT-like Masked Language Models As Generative Classifiers Paper • 2502.03793 • Published Feb 6, 2025 • 4
Babel Collection Open Multilingual Large Language Models Serving Over 90% of Global Speakers • 5 items • Updated Apr 15, 2025 • 18
Rank1: Test-Time Compute for Reranking in Information Retrieval Paper • 2502.18418 • Published Feb 25, 2025 • 28
Tulu 3 Models Collection All models released with Tulu 3 -- state of the art open post-training recipes. • 11 items • Updated 13 days ago • 103
Granite 3.1 Language Models Collection A series of language models with 128K context length trained by IBM licensed under Apache 2.0 license. • 9 items • Updated Nov 17, 2025 • 68
POTION Collection These are the flagship POTION models. Load them and use them with model2vec (https://github.com/MinishLab/model2vec) or sentence-transformers • 6 items • Updated Nov 13, 2025 • 14
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3, 2024 • 83
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 82
GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence Paper • 2310.05388 • Published Oct 9, 2023 • 4