-
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Paper • 2505.20686 • Published • 2 -
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer • Updated • 7.47k • 22 -
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer • Updated • 7.47k • 8 -
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer • Updated • 7.47k • 7
Cornell-AGI
university
AI & ML interests
Reinforcement Learning from Human Feedback
-
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Paper • 2410.04612 • Published -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1
8B • Updated -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2
8B • Updated -
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer • Updated • 64.6k • 26 • 2
-
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Paper • 2505.20686 • Published • 2 -
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer • Updated • 7.47k • 22 -
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer • Updated • 7.47k • 8 -
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer • Updated • 7.47k • 7
-
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Paper • 2410.04612 • Published -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_1
8B • Updated -
Cornell-AGI/REFUEL-Llama-3-Armo-iter_2
8B • Updated -
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer • Updated • 64.6k • 26 • 2
models 20
Cornell-AGI/apo_math_qwen2.5_1.5b
Text Generation • 2B • Updated
Cornell-AGI/ppo_math_qwen2.5_1.5b
Text Generation • 2B • Updated
Cornell-AGI/rebel_math_qwen2.5_1.5b
Text Generation • 2B • Updated
Cornell-AGI/grpo_math_qwen2.5_3b
Text Generation • 3B • Updated
Cornell-AGI/grpo_math_qwen2.5_1.5b
Text Generation • 2B • Updated
Cornell-AGI/ppo_math_qwen2.5_3b
Text Generation • 3B • Updated
Cornell-AGI/rebel_math_qwen2.5_3b
Text Generation • 3B • Updated
Cornell-AGI/apo_math_qwen2.5_3b
Text Generation • 3B • Updated • 1
Cornell-AGI/grpo_math_qwen2.5_7b
Text Generation • 8B • Updated
Cornell-AGI/ppo_math_qwen2.5_7b
Text Generation • 8B • Updated • 3
datasets 15
Cornell-AGI/math_size_qwen2.5_7b_eval
Viewer • Updated • 7.5k • 7
Cornell-AGI/math_size_qwen2.5_3b_eval
Viewer • Updated • 7.5k • 6
Cornell-AGI/math_size_qwen2.5_1.5b_eval
Viewer • Updated • 7.5k • 10
Cornell-AGI/gsm8k_size_qwen2.5_7b_eval
Viewer • Updated • 7.47k • 7
Cornell-AGI/gsm8k_size_qwen2.5_3b_eval
Viewer • Updated • 7.47k • 8
Cornell-AGI/gsm8k_size_qwen2.5_1.5b_eval
Viewer • Updated • 7.47k • 22
Cornell-AGI/amazon_movie_tv_item_mxbai
Viewer • Updated • 10.5k • 6
Cornell-AGI/amazon_movie_tv_llama_mxbai
Viewer • Updated • 17.1k • 7
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_2
Viewer • Updated • 116k • 80 • 1
Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1
Viewer • Updated • 64.6k • 26 • 2