Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents Paper • 2510.24702 • Published Oct 28 • 27
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29 • 45
Simulating Environments with Reasoning Models for Agent Training Paper • 2511.01824 • Published Nov 3 • 2
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Paper • 2512.07783 • Published 17 days ago • 35
Improving the Efficiency of LLM Agent Systems through Trajectory Reduction Paper • 2509.23586 • Published Sep 28
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence Paper • 2511.18538 • Published Nov 23 • 274
VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos Paper • 2510.19488 • Published Oct 22 • 19
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents Paper • 2411.06559 • Published Nov 10, 2024 • 16
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models Paper • 2502.06755 • Published Feb 10 • 7
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents Paper • 2502.11357 • Published Feb 17 • 11
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20 • 45
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models Paper • 2502.14802 • Published Feb 20 • 13
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics Paper • 2411.16537 • Published Nov 25, 2024
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 301
An Illusion of Progress? Assessing the Current State of Web Agents Paper • 2504.01382 • Published Apr 2 • 4
Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis Paper • 2501.09333 • Published Jan 16 • 1
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills Paper • 2504.07079 • Published Apr 9 • 12
MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools Paper • 2504.20168 • Published Apr 28 • 1
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments Paper • 2505.21936 • Published May 28 • 1