SWE-RM: Execution-free Feedback For Software Engineering Agents Paper • 2512.21919 • Published 7 days ago • 8
Towards Scalable Pre-training of Visual Tokenizers for Generation Paper • 2512.13687 • Published 17 days ago • 98
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published Nov 24, 2025 • 60
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper • 2510.23538 • Published Oct 27, 2025 • 96
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 45
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications Paper • 2509.26490 • Published Sep 30, 2025 • 19
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents Paper • 2509.06501 • Published Sep 8, 2025 • 79
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents Paper • 2509.06501 • Published Sep 8, 2025 • 79 • 3
WebExplorer Collection The collection for the Paper "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents" • 2 items • Updated Sep 8, 2025