Anthropic/hh-rlhf
Viewer • Updated • 169k • 38.3k • 1.74k
How to use sileod/deberta-v3-large-tasksource-rlhf-reward-model with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="sileod/deberta-v3-large-tasksource-rlhf-reward-model") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("sileod/deberta-v3-large-tasksource-rlhf-reward-model")
model = AutoModelForSequenceClassification.from_pretrained("sileod/deberta-v3-large-tasksource-rlhf-reward-model")deberta-v3-large-tasksource-nli fine-tuned on Anthropic/hh-rlhf
For 1 epoch with 1e-5 learning rate.
The data are described in the paper: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.
Validation accuracy is currently the best publicly available reported: 75.16% (vs 69.25% for OpenAssistant/reward-model-deberta-v3-large-v2).