The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 46
TheMCPCompany: Creating General-purpose Agents with Task-specific Tools Paper • 2510.19286 • Published Oct 22, 2025 • 9
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky Paper • 2507.03336 • Published Jul 4, 2025 • 7