Clarification on Post MLP Normalization

#7
by dungquixote42 - opened

"model-00001-of-00260.safetensors" includes "model.layers.0.post_mlp_layernorm.weight" but "modeling_axk1.py" seems to indicate normalization is applied on MoE layers, which is layer 1 and onward.
Is "model.layers.0.post_mlp_layernorm.weight" a placeholder?

SK Telecom org

Yes, that parameter is effectively an identity op.
It can be considered a placeholder for consistency, and it is not functionally used for layer 0.
We plan to remove it in the next release.

singleheart changed discussion status to closed

Sign up or log in to comment