In-Context Reinforcement Learning for Tool Use in Large Language Models Paper • 2603.08068 • Published 12 days ago • 39
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published 8 days ago • 62
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings Paper • 2603.13594 • Published 7 days ago • 139
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28, 2025 • 179
view article Article Apriel-1.6-15b-Thinker: Cost-efficient Frontier Multimodal Performance Dec 9, 2025 • 84
Scaling Spatial Intelligence with Multimodal Foundation Models Paper • 2511.13719 • Published Nov 17, 2025 • 48
Multimodal Evaluation of Russian-language Architectures Paper • 2511.15552 • Published Nov 19, 2025 • 79
view article Article Apriel-H1: The Surprising Key to Distilling Efficient Reasoning Models Nov 19, 2025 • 34
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance Paper • 2511.13254 • Published Nov 17, 2025 • 139
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models Paper • 2511.08577 • Published Nov 11, 2025 • 109