wikimedia/wikipedia
Viewer • Updated • 61.6M • 265k • 1.22k
Distilled with Distily library using teacher model gpt2 on dataset wikimedia/wikipedia.
GPT2LMHeadModel| Metric | | | :--- |
GPT2LMHeadModel -> GPT2LMHeadModel
Trained on 145,744,973 tokens from the wikimedia/wikipedia dataset.
247,50020231101.entrainDistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl), attn_loss_component=LossComponent(label=attn, weight=25.0, loss_fn=cos, layer_mapper=layer-2, projector=identity))
The following hyperparameters were used during training:
0.00014842Adam with betas=(0.9,0.999) and epsilon=1e-08cosine_with_min_lr0.51.0DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl), attn_loss_component=LossComponent(label=attn, weight=25.0, loss_fn=cos, layer_mapper=layer-2, projector=identity))True<torch.optim.lr_scheduler.LambdaLR object at 0x7fb8899c3c40>NoneNoneNoneNone[('lm_head', False)]TrueNonegpt2FalseFalsewikimedia/wikipedia20231101.entraintext2500000.0110.01.00.50TrueBase model
openai-community/gpt2