raftrsf (raft

hendrydong

authored 4 papers 8 months ago

Fractured Chain-of-Thought Reasoning

Paper • 2505.12992 • Published May 19, 2025 • 23

Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Paper • 2505.10554 • Published May 15, 2025 • 120

Scalable Chain of Thoughts via Elastic Reasoning

Paper • 2505.05315 • Published May 8, 2025 • 26

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Paper • 2505.02391 • Published May 5, 2025 • 25

Chenlu123

authored a paper 10 months ago

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26, 2025 • 82

hendrydong

authored 2 papers 11 months ago

BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation

Paper • 2502.03860 • Published Feb 6, 2025 • 25

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Paper • 2501.19324 • Published Jan 31, 2025 • 39

hendrydong

authored 2 papers about 1 year ago

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38

MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs

Paper • 2410.04698 • Published Oct 7, 2024 • 13

hendrydong

authored a paper over 1 year ago

ThinK: Thinner Key Cache by Query-Driven Pruning

Paper • 2407.21018 • Published Jul 30, 2024 • 32

weqweasdas

updated a model over 1 year ago

raftrsf/sfr_raft_iter5_2epoch

Text Generation • 8B • Updated Jun 17, 2024 • 5

weqweasdas

updated 2 datasets over 1 year ago

raftrsf/sfr_concise_iter5_top1

Viewer • Updated Jun 14, 2024 • 20k • 18

raftrsf/sfr_concise_iter5_k32_with_rewards

Viewer • Updated Jun 14, 2024 • 20k • 18

weqweasdas

updated 2 models over 1 year ago

raftrsf/sfr_raft_iter4_2epoch

Text Generation • 8B • Updated Jun 13, 2024 • 8

raftrsf/sfr_raft_iter4

Text Generation • 8B • Updated Jun 13, 2024 • 5

weqweasdas

updated 2 datasets over 1 year ago

raftrsf/sfr_concise_iter4_top1

Viewer • Updated Jun 12, 2024 • 20k • 13

raftrsf/sfr_concise_iter4_k32_with_rewards

Viewer • Updated Jun 12, 2024 • 20k • 20

weqweasdas

updated a model over 1 year ago

raftrsf/pair_pref

Text Generation • 8B • Updated May 18, 2024 • 4

weqweasdas

updated a dataset over 1 year ago

raftrsf/ipo_eval_data_baseline.json

Viewer • Updated May 18, 2024 • 7.62k • 14

weqweasdas

authored a paper over 1 year ago

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 71

raft_study

AI & ML interests

Fractured Chain-of-Thought Reasoning

Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Scalable Chain of Thoughts via Elastic Reasoning

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Self-rewarding correction for mathematical reasoning

BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Offline Reinforcement Learning for LLM Multi-Step Reasoning

MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs

ThinK: Thinner Key Cache by Query-Driven Pruning

raftrsf/sfr_raft_iter5_2epoch

raftrsf/sfr_concise_iter5_top1

raftrsf/sfr_concise_iter5_k32_with_rewards

raftrsf/sfr_raft_iter4_2epoch

raftrsf/sfr_raft_iter4

raftrsf/sfr_concise_iter4_top1

raftrsf/sfr_concise_iter4_k32_with_rewards

raftrsf/pair_pref

raftrsf/ipo_eval_data_baseline.json

RLHF Workflow: From Reward Modeling to Online RLHF

AI & ML interests

Team members 3

raftrsf's activity