Instructions to use PKU-DS-LAB/FairyR1-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use PKU-DS-LAB/FairyR1-32B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="PKU-DS-LAB/FairyR1-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("PKU-DS-LAB/FairyR1-32B") model = AutoModelForCausalLM.from_pretrained("PKU-DS-LAB/FairyR1-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use PKU-DS-LAB/FairyR1-32B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "PKU-DS-LAB/FairyR1-32B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PKU-DS-LAB/FairyR1-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/PKU-DS-LAB/FairyR1-32B
- SGLang
How to use PKU-DS-LAB/FairyR1-32B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "PKU-DS-LAB/FairyR1-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PKU-DS-LAB/FairyR1-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "PKU-DS-LAB/FairyR1-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PKU-DS-LAB/FairyR1-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use PKU-DS-LAB/FairyR1-32B with Docker Model Runner:
docker model run hf.co/PKU-DS-LAB/FairyR1-32B
Update README.md
Browse files
README.md
CHANGED
|
@@ -26,7 +26,7 @@ The FairyR1 model represents a further exploration of our earlier work [TinyR1](
|
|
| 26 |
|
| 27 |
In this effort, we overhauled the distillation data pipeline: raw examples from datasets such as AIMO/NuminaMath-1.5 for mathematics and OpenThoughts-114k for code were first passed through multiple 'teacher' models to generate candidate answers. These candidates were then carefully selected, restructured, and refined, especially for the chain-of-thought(CoT). Subsequently, we applied multi-stage filtering—including automated correctness checks for math problems and length-based selection (2K–8K tokens for math samples, 4K–8K tokens for code samples). This yielded two focused training sets of roughly 6.6K math examples and 3.8K code examples.
|
| 28 |
|
| 29 |
-
On the modeling side, rather than training three separate specialists as before, we limited our scope to just two domain experts (math and code), each trained independently under identical hyperparameters (e.g., learning rate and batch size) for about five epochs. We then fused these experts into a single 32B-parameter model using the [
|
| 30 |
|
| 31 |
## Result Analysis and Key Contributions:
|
| 32 |
|
|
|
|
| 26 |
|
| 27 |
In this effort, we overhauled the distillation data pipeline: raw examples from datasets such as AIMO/NuminaMath-1.5 for mathematics and OpenThoughts-114k for code were first passed through multiple 'teacher' models to generate candidate answers. These candidates were then carefully selected, restructured, and refined, especially for the chain-of-thought(CoT). Subsequently, we applied multi-stage filtering—including automated correctness checks for math problems and length-based selection (2K–8K tokens for math samples, 4K–8K tokens for code samples). This yielded two focused training sets of roughly 6.6K math examples and 3.8K code examples.
|
| 28 |
|
| 29 |
+
On the modeling side, rather than training three separate specialists as before, we limited our scope to just two domain experts (math and code), each trained independently under identical hyperparameters (e.g., learning rate and batch size) for about five epochs. We then fused these experts into a single 32B-parameter model using the [ArceeFusion](https://arxiv.org/pdf/2403.13257) tool. By streamlining both the data distillation workflow and the specialist-model merging process, FairyR1 achieves task-competitive results with only a fraction of the parameters and computational cost of much larger models.
|
| 30 |
|
| 31 |
## Result Analysis and Key Contributions:
|
| 32 |
|