arxiv:2604.16830

The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation

Published on Apr 18

· Submitted by

Jiaxin Zhang on Apr 21

Salesforce AI Research

Upvote

Authors:

Jiaxin Zhang ,

Qinglin Chen ,

Abstract

On-policy distillation suffers from miscalibration due to information mismatch between training and deployment contexts, which is addressed through a calibration-aware framework that improves both performance and confidence reliability.

AI-generated summary

On-policy distillation (OPD) is an increasingly important paradigm for post-training language models. However, we identify a pervasive Scaling Law of Miscalibration: while OPD effectively improves task accuracy, it systematically traps models in severe overconfidence. We trace this failure to an information mismatch: teacher supervision is formed under privileged context available during training, whereas the deployed model must report confidence using only deployment-time information. We formalize this perspective theoretically, showing that teacher-conditioned success is generally not a valid target for deployment-time confidence and that helpful privileged context induces entropy collapse and a systematic optimism bias. To address this, we propose a calibration-aware OPD framework, CaOPD, that estimates empirical confidence from model rollouts, replaces self-reported confidence with this student-grounded target, and distills the revised response through the same self-distillation pipeline. Experiments across various models and domains show that CaOPD achieves Pareto-optimal calibration while maintaining competitive capability, generalizing robustly under out-of-distribution and continual learning. Our findings highlight that capability distillation does not imply calibrated confidence, and that confidence should be treated as an essential objective in post-training. Code: https://github.com/SalesforceAIResearch/CaOPD

View arXiv page View PDF Project page GitHub 6 Add to collection

Community

zhangjiaxin2012

Paper author Paper submitter 1 day ago

The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation:

On-policy distillation (OPD) improves task accuracy but systematically traps models in severe overconfidence. We trace this to an information mismatch between training and deployment, and introduce CaOPD to fix it.

→ Identifies a pervasive Scaling Law of Miscalibration: even frontier LLMs exhibit massive calibration gaps that scale does not resolve
→ Formalizes how privileged teacher context induces entropy collapse and optimism bias in the student
→ Replaces self-reported confidence with a student-grounded empirical target, decoupling what the model answers from how certain it should be
→ Achieves Pareto-optimal calibration without the capability tax of RL-based methods, enabling a compact 8B model to rival frontier LLMs on reliability