ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas Paper • 2601.21558 • Published 1 day ago • 2
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking Paper • 2503.19855 • Published Mar 25, 2025 • 29