Clarification on Post MLP Normalization
#7
by
dungquixote42 - opened
"model-00001-of-00260.safetensors" includes "model.layers.0.post_mlp_layernorm.weight" but "modeling_axk1.py" seems to indicate normalization is applied on MoE layers, which is layer 1 and onward.
Is "model.layers.0.post_mlp_layernorm.weight" a placeholder?
Yes, that parameter is effectively an identity op.
It can be considered a placeholder for consistency, and it is not functionally used for layer 0.
We plan to remove it in the next release.
singleheart changed discussion status to
closed