CodeDPO/qwen25-ins-7b-coderm_new_margin_scalebt-7b-reinforce-plus-episode_1
Text Generation
•
8B
•
Updated
•
2
CodeDPO/qwen25-coder-base-7b-testcaserm-7b-new-dataset-hard
8B
•
Updated
•
6
CodeDPO/Qwen2.5-Coder-7B-binarized
7B
•
Updated
•
6
CodeDPO/Qwen2.5-Coder-7B-new_with_margin_scalebt
7B
•
Updated
•
3
CodeDPO/Qwen2.5-Coder-7B_with_margin_scalebt
7B
•
Updated
•
6
CodeDPO/qwen25-coder-base-7b-testcaserm-7b-ppo-binary
8B
•
Updated
•
7
CodeDPO/qwen25-ins-7b-testcaserm-7b-reinforce-plus_new_dataset
8B
•
Updated
•
4
CodeDPO/qwen25-ins-7b-testcaserm-7b-reinforce-plus-binary
8B
•
Updated
•
4
CodeDPO/qwen25-ins-7b-coderm-7b-ppo
8B
•
Updated
•
4
CodeDPO/qwen25-ins-7b-testcaserm-7b-reinforce-plus
8B
•
Updated
•
2
CodeDPO/qwen_coder_2.5_rm_openrlhf
7B
•
Updated
•
5
CodeDPO/llama3-RL-both-E2-0117-ckpt1624
8B
•
Updated
•
4