Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents Paper • 2509.09265 • Published Sep 11, 2025 • 47
RLAE: Reinforcement Learning-Assisted Ensemble for LLMs Paper • 2506.00439 • Published May 31, 2025 • 1
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation Paper • 2503.12854 • Published Mar 17, 2025
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning Paper • 2506.19767 • Published Jun 24, 2025 • 15