OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation Paper • 2606.17628 • Published 8 days ago • 27
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments Paper • 2606.13681 • Published 13 days ago • 140
FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents Paper • 2606.12087 • Published 14 days ago • 75
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement Paper • 2606.11926 • Published 14 days ago • 117
From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors Paper • 2605.31042 • Published 26 days ago • 19
AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning Paper • 2605.24486 • Published May 23 • 6
SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent Paper • 2605.24468 • Published May 23 • 9
PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models Paper • 2605.20873 • Published May 20 • 44
ClawGym: A Scalable Framework for Building Effective Claw Agents Paper • 2604.26904 • Published Apr 29 • 54
AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery Paper • 2604.25256 • Published Apr 28 • 30
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence Paper • 2604.18292 • Published Apr 20 • 87
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published Mar 17 • 141
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data Paper • 2603.15594 • Published Mar 16 • 150
MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning Paper • 2603.03379 • Published Mar 3 • 32
DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories Paper • 2602.10809 • Published Feb 11 • 59
LawThinker: A Deep Research Legal Agent in Dynamic Environments Paper • 2602.12056 • Published Feb 12 • 35
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning Paper • 2602.10560 • Published Feb 11 • 31