Model-Based and Sample-Efficient AI-Assisted Math Discovery in Sphere Packing Paper • 2512.04829 • Published 28 days ago • 11
Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective Paper • 2509.22921 • Published Sep 26, 2025 • 11
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10, 2025 • 190
Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving Paper • 2507.02726 • Published Jul 3, 2025 • 14
Ark: An Open-source Python-based Framework for Robot Learning Paper • 2506.21628 • Published Jun 24, 2025 • 16