TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior Paper • 2512.20757 • Published 9 days ago • 16
Hierarchical Dataset Selection for High-Quality Data Sharing Paper • 2512.10952 • Published 21 days ago • 1
Causal Judge Evaluation: Calibrated Surrogate Metrics for LLM Systems Paper • 2512.11150 • Published 21 days ago • 4
Skywork-Reward-V2 Collection Scaling preference data curation to the extreme • 9 items • Updated Jul 4, 2025 • 26
Reward Models 10-2025 Collection A collection of great reward models for research and production • 7 items • Updated 9 days ago • 12
Olmo 3 Pre-training Collection All artifacts related to Olmo 3 pre-training • 10 items • Updated 9 days ago • 32
Mitigating Label Length Bias in Large Language Models Paper • 2511.14385 • Published Nov 18, 2025 • 7
view article Article ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases Nov 5, 2025 • 57
OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation Paper • 2511.13655 • Published Nov 17, 2025 • 9
view article Article The Heterogeneous Feature of RoPE-based Attention in Long-Context LLMs Nov 15, 2025 • 12
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls Paper • 2511.09148 • Published Nov 12, 2025 • 16
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs Paper • 2511.07419 • Published Nov 10, 2025 • 26