Resolving inference compatibility issues in the Kormo model’s Transformer 5.2
#3
by jungsin3 - opened
In the case of RotaryEmbedding, the inv_freq value is calculated in the init and reused.
In Transformers 5.2, the model is loaded using the meta device, so this calculation does not take place. Consequently, in 5.2, logic was added to the _init_weights function to restore inv_freq via an else statement. In the case of KORMo, as it uses a custom _init_weights function, this logic was not applied, resulting in the issue where the RoPE value was not used during inference.
The following changes have been made to the code:
- Added logic to restore
inv_freqin_init_weightstoKORMoPreTrainedModel. - Added the
copy_function used in_init_weightsto the top of the file. - We resolved an issue where the
original_inv_freqkey value was not registered in_bufferby cloning theself.inv_freqvalue, which previously returnedNonebecause it was not calculated. (RotaryEmbedding) - We added the
compute_default_rope_parametersfunction, which was missing in version 5.2. (RotaryEmbedding)
Compatible with both version 4.57.1 and version 5.2.
Thank you.
Great work! Thank you for contributing to our KORMo repository.
LGTM
mjkmain changed pull request status to merged