princeton-nlp/llama3-ultrafeedback
Viewer • Updated • 61.8k • 755 • 18
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.2474 | 0.8550 | 400 | 1.2446 | -0.3293 | -0.3880 | 0.5813 | 0.0587 | -0.3880 | -0.3293 | 0.0487 | 0.0578 |
Base model
meta-llama/Meta-Llama-3-8B-Instruct