Abstract
Large reasoning models can implicitly determine optimal stopping points for thinking, which SAGE-RL enhances by incorporating efficient reasoning patterns into pass@1 inference for improved accuracy and efficiency.
Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. Recent studies show that longer reasoning chains are frequently uncorrelated with correctness and can even be detrimental to accuracy. In a further in-depth analysis of this phenomenon, we surprisingly uncover and empirically verify that LRMs implicitly know the appropriate time to stop thinking, while this capability is obscured by current sampling paradigms. Motivated by this, we introduce SAGE (Self-Aware Guided Efficient Reasoning), a novel sampling paradigm that unleashes this efficient reasoning potential. Furthermore, integrating SAGE as mixed sampling into group-based reinforcement learning (SAGE-RL) enables SAGE-RL to effectively incorporate SAGE-discovered efficient reasoning patterns into standard pass@1 inference, markedly enhancing both the reasoning accuracy and efficiency of LRMs across multiple challenging mathematical benchmarks.
Community
Large reasoning models already implicitly know when they have reached the correct answer.
We just don’t let them stop.
Project Page: https://hzx122.github.io/sage-rl/
I haven't read this paper yet (soon) but I can fairly confidently tell you that https://hf.co/Nanbeige/Nanbeige4.1-3B cannot tell when to stop
Haha, thanks for your comment. Efficient inference potential is inherently tied to the model itself; it cannot exceed the model’s inherent upper limit. Our self-aware guided efficient reasoning (SAGE) leverages cumulative self-confidence to discover concise, correct reasoning chains based on the inherent potential of the model. We further integrate SAGE into RL via SAGE-RL, a minimal modification to RLVR that incorporates efficient reasoning patterns into the model's standard pass@1 inference.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning (2026)
- TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning (2026)
- ThinkRouter: Efficient Reasoning via Routing Thinking between Latent and Discrete Spaces (2026)
- Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty (2026)
- On-Policy Supervised Fine-Tuning for Efficient Reasoning (2026)
- Anti-Length Shift: Dynamic Outlier Truncation for Training Efficient Reasoning Models (2026)
- Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/does-your-reasoning-model-implicitly-know-when-to-stop-thinking-1467-bceb5ae4
- Executive Summary
- Detailed Breakdown
- Practical Applications
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
