Clarification on Post MLP Normalization

by dungquixote42 - opened Feb 5

Feb 5

"model-00001-of-00260.safetensors" includes "model.layers.0.post_mlp_layernorm.weight" but "modeling_axk1.py" seems to indicate normalization is applied on MoE layers, which is layer 1 and onward.
Is "model.layers.0.post_mlp_layernorm.weight" a placeholder?

singleheart

SK Telecom org Feb 6

Yes, that parameter is effectively an identity op.
It can be considered a placeholder for consistency, and it is not functionally used for layer 0.
We plan to remove it in the next release.

singleheart changed discussion status to closed Feb 6

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment