Stella Li PRO
stellalisy
AI & ML interests
None yet
Recent Activity
updated
a model
5 days ago
stellalisy/DeepScaleR-qwen3_1.7b_gs_lr1e-5_ep2_step218
published
a model
5 days ago
stellalisy/DeepScaleR-qwen3_1.7b_gs_lr1e-5_ep2_step218
updated
a dataset
11 days ago
stellalisy/cognitive_foundations
Organizations
Spurious Rewards
Spurious Rewards: Rethinking Training Signals in RLVR
-
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step50
Text Generation • 8B • Updated • 7 -
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step100
Text Generation • 8B • Updated • 11 -
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step150
Text Generation • 8B • Updated • 51 -
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step50
Text Generation • 8B • Updated • 9
Personalized Reasoning
Spurious Rewards
Spurious Rewards: Rethinking Training Signals in RLVR
-
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step50
Text Generation • 8B • Updated • 7 -
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step100
Text Generation • 8B • Updated • 11 -
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step150
Text Generation • 8B • Updated • 51 -
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step50
Text Generation • 8B • Updated • 9
models
31
stellalisy/DeepScaleR-qwen3_1.7b_gs_lr1e-5_ep2_step218
2B
•
Updated
•
11
stellalisy/system_select_dpo-3b-lr1e-5-b0.1
Text Generation
•
3B
•
Updated
•
8
stellalisy/system_select_dpo-3b-lr1e-6-b0.1
Text Generation
•
3B
•
Updated
•
11
stellalisy/system_select_dpo-3b-lr1e-5-b0.0
Text Generation
•
3B
•
Updated
•
7
stellalisy/system_select_dpo-1b-lr1e-6-b0.1
Text Generation
•
1B
•
Updated
•
8
stellalisy/system_select_dpo-1b-lr1e-5-b0.1
Text Generation
•
1B
•
Updated
•
4
stellalisy/system_select_dpo-1b-lr1e-6-b0.0
Text Generation
•
1B
•
Updated
•
6
stellalisy/system_select_dpo-1b-lr1e-5-b0.0
Text Generation
•
1B
•
Updated
•
7
stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step150
Text Generation
•
8B
•
Updated
•
9
stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step100
Text Generation
•
8B
•
Updated
•
9
datasets
23
stellalisy/cognitive_foundations
Preview
•
Updated
•
6
stellalisy/Dolci-RLZero-Math-7B_random
Viewer
•
Updated
•
13.3k
•
11
stellalisy/PrefPalette
Viewer
•
Updated
•
2.01M
•
6
stellalisy/HorizonPref_natural_0827
Viewer
•
Updated
•
1.75k
•
2
stellalisy/DAPO-Math-14k-Processed-RLVR_random
Viewer
•
Updated
•
14.1k
•
12
stellalisy/rlvr_orz_math_57k_collected_random
Viewer
•
Updated
•
56.9k
•
7
stellalisy/personalized_simpleqa
Preview
•
Updated
•
1
stellalisy/personalized_socialiqa
Preview
•
Updated
•
1
stellalisy/personalized_scienceqa
Preview
•
Updated
•
1
stellalisy/personalized_mmlu
Preview
•
Updated
•
1