The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement Paper • 2605.30888 • Published 25 days ago • 10
Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation Paper • 2605.29861 • Published 26 days ago • 16
SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue Paper • 2605.30993 • Published 25 days ago • 59
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 27 days ago • 430
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published May 12 • 196
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration Paper • 2605.20025 • Published May 19 • 190
DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models Paper • 2605.15055 • Published May 14 • 19
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 328
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 508
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Paper • 2603.25716 • Published Mar 26 • 157
The Trinity of Consistency as a Defining Principle for General World Models Paper • 2602.23152 • Published Feb 26 • 202