Top papers ⭐
updated
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
• 2501.08313
• Published
• 300
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
• 2501.04519
• Published
• 288
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper
• 2412.13663
• Published
• 161
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
• 2412.10360
• Published
• 147
Paper
• 2412.08905
• Published
• 122
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
• 2412.09871
• Published
• 108
Paper
• 2412.15115
• Published
• 377
Paper
• 2501.14249
• Published
• 77
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published
• 255
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of
Physical Concept Understanding
Paper
• 2502.08946
• Published
• 191
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on
a Single GPU
Paper
• 2502.08910
• Published
• 148
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse
Attention
Paper
• 2502.11089
• Published
• 168
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper
• 2502.14499
• Published
• 194