🔄 In a Training Loop

Tianle Wang

wtl666wtl

https://wtl666wtl.github.io/

wtl666wtl

AI & ML interests

None yet

Organizations

None yet

upvoted 2 papers 2 months ago

Large Language Models Explore by Latent Distilling

Paper • 2604.24927 • Published Apr 27 • 74

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Paper • 2605.06638 • Published May 7 • 16

upvoted a paper 5 months ago

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

Paper • 2602.10090 • Published Feb 10 • 53

upvoted a paper 12 months ago

MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27, 2025 • 15