EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning Paper • 2502.12486 • Published Feb 18, 2025 • 2