Adaptive Length Penalty
Collection
Teaching language models to think efficiently with Adaptive Length Penalty (ALP) • 3 items • Updated
• 1
R1-Distill-Qwen-1.5B trained with Adaptive Length Penalty (ALP) - reduces token usage by ~50% while maintaining performance.
prompt = f"{problem} Let's think step by step and output the final answer within \\boxed{{}}."