Instructions to use QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF", dtype="auto") - llama-cpp-python
How to use QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF", filename="MagpieLM-4B-Chat-v0.1.Q2_K.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF:Q4_K_M
Use Docker
docker model run hf.co/QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF with Ollama:
ollama run hf.co/QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF:Q4_K_M
- Unsloth Studio
How to use QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF to start chatting
- Docker Model Runner
How to use QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF with Docker Model Runner:
docker model run hf.co/QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF:Q4_K_M
- Lemonade
How to use QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.MagpieLM-4B-Chat-v0.1-GGUF-Q4_K_M
List all available models
lemonade list
QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF
This is quantized version of Magpie-Align/MagpieLM-4B-Chat-v0.1 created using llama.cpp
Original Model Card
🐦 MagpieLM-4B-Chat-v0.1
🧐 About This Model
Model full name: Llama3.1-MagpieLM-4B-Chat-v0.1
This model is an aligned version of Llama-3.1-Minitron-4B-Width, which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct and Qwen-2-7B-Instruct.
We apply the following standard alignment pipeline with two carefully crafted synthetic datasets. Feel free to use these datasets and reproduce our model, or make your own friendly chatbots :)
We first perform SFT using Magpie-Align/MagpieLM-SFT-Data-v0.1.
- SFT Model Checkpoint: Magpie-Align/MagpieLM-4B-SFT-v0.1
We then perform DPO on the Magpie-Align/MagpieLM-DPO-Data-v0.1 dataset.
See more powerful 8B version here!
🔥 Benchmark Performance
Greedy Decoding
- Alpaca Eval 2: 40.99 (LC), 45.19 (WR)
- Arena Hard: 24.6
- WildBench WB Score (v2.0625): 32.37
Benchmark Performance Compare to Other SOTA SLMs
👀 Other Information
License: Please follow NVIDIA Open Model License Agreement.
Conversation Template: Please use the Llama 3 chat template for the best performance.
Limitations: This model primarily understands and generates content in English. Its outputs may contain factual errors, logical inconsistencies, or reflect biases present in the training data. While the model aims to improve instruction-following and helpfulness, it isn't specifically designed for complex reasoning tasks, potentially leading to suboptimal performance in these areas. Additionally, the model may produce unsafe or inappropriate content, as no specific safety training were implemented during the alignment process.
🧐 How to use it?
Please update transformers to the latest version by pip install git+https://github.com/huggingface/transformers.
You can then run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
import transformers
import torch
model_id = "MagpieLM-4B-Chat-v0.1"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are Magpie, a friendly AI assistant."},
{"role": "user", "content": "Who are you?"},
]
outputs = pipeline(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Alignment Pipeline
The detailed alignment pipeline is as follows.
Stage 1: Supervised Fine-tuning
We use Axolotl for SFT. Please refer to the model card of SFT checkpoint and below for detailed configurations.
See axolotl config
axolotl version: 0.4.1
base_model: nvidia/Llama-3.1-Minitron-4B-Width-Base
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
chat_template: llama3
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: Magpie-Align/MagpieLM-SFT-Data-v0.1
type: sharegpt
conversation: llama3
dataset_prepared_path: last_run_prepared
val_set_size: 0.001
output_dir: axolotl_out/MagpieLM-4B-SFT-v0.1
sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
wandb_project: SynDa
wandb_entity:
wandb_watch:
wandb_name: Llama3.1-MagpieLM-4B-SFT-v0.1
wandb_log_model:
hub_model_id: Magpie-Align/MagpieLM-4B-SFT-v0.1
gradient_accumulation_steps: 32
micro_batch_size: 1
num_epochs: 2
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5
train_on_inputs: false
group_by_length: false
bf16: true
fp16:
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_ratio: 0.1
evals_per_epoch: 5
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
pad_token: <|end_of_text|>
Stage 2: Direct Preference Optimization
We use alignment handbook for DPO.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1.5e-07
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.6911 | 0.0653 | 100 | 0.6912 | -0.0026 | -0.0066 | 0.5640 | 0.0041 | -502.9037 | -510.6042 | -1.7834 | -1.7781 |
| 0.6703 | 0.1306 | 200 | 0.6713 | -0.1429 | -0.1981 | 0.6380 | 0.0552 | -522.0521 | -524.6394 | -1.7686 | -1.7593 |
| 0.6306 | 0.1959 | 300 | 0.6347 | -0.6439 | -0.8210 | 0.6840 | 0.1770 | -584.3356 | -574.7375 | -1.7536 | -1.7436 |
| 0.5831 | 0.2612 | 400 | 0.5932 | -1.5155 | -1.8774 | 0.7070 | 0.3619 | -689.9788 | -661.8920 | -1.6963 | -1.6877 |
| 0.5447 | 0.3266 | 500 | 0.5645 | -2.1858 | -2.7052 | 0.7110 | 0.5195 | -772.7636 | -728.9221 | -1.6249 | -1.6207 |
| 0.5896 | 0.3919 | 600 | 0.5453 | -2.3771 | -2.9747 | 0.7180 | 0.5976 | -799.7122 | -748.0584 | -1.5836 | -1.5847 |
| 0.5342 | 0.4572 | 700 | 0.5305 | -2.6231 | -3.3063 | 0.7350 | 0.6832 | -832.8744 | -772.6592 | -1.5454 | -1.5524 |
| 0.511 | 0.5225 | 800 | 0.5177 | -3.0517 | -3.8393 | 0.7400 | 0.7876 | -886.1714 | -815.5145 | -1.5160 | -1.5273 |
| 0.5007 | 0.5878 | 900 | 0.5088 | -3.0925 | -3.9197 | 0.7540 | 0.8273 | -894.2120 | -819.5908 | -1.5007 | -1.5144 |
| 0.485 | 0.6531 | 1000 | 0.5033 | -3.1305 | -3.9863 | 0.7630 | 0.8558 | -900.8680 | -823.3940 | -1.4834 | -1.4997 |
| 0.4307 | 0.7184 | 1100 | 0.4989 | -3.1387 | -4.0097 | 0.7610 | 0.8710 | -903.2113 | -824.2159 | -1.4728 | -1.4911 |
| 0.5403 | 0.7837 | 1200 | 0.4964 | -3.3418 | -4.2574 | 0.7620 | 0.9156 | -927.9747 | -844.5242 | -1.4641 | -1.4822 |
| 0.5182 | 0.8490 | 1300 | 0.4952 | -3.3255 | -4.2430 | 0.7600 | 0.9175 | -926.5396 | -842.8945 | -1.4601 | -1.4788 |
| 0.5165 | 0.9144 | 1400 | 0.4943 | -3.3308 | -4.2525 | 0.7600 | 0.9217 | -927.4913 | -843.4282 | -1.4610 | -1.4799 |
| 0.5192 | 0.9797 | 1500 | 0.4942 | -3.3377 | -4.2603 | 0.7620 | 0.9226 | -928.2655 | -844.1144 | -1.4591 | -1.4783 |
Framework versions
- Transformers 4.45.0.dev0
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
See alignment handbook configs
# Customized Configs
model_name_or_path: Magpie-Align/MagpieLM-4B-SFT-v0.1
hub_model_id: Magpie-Align/MagpieLM-4B-Chat-v0.1
output_dir: alignment_handbook_out/MagpieLM-4B-Chat-v0.1
run_name: MagpieLM-4B-Chat-v0.1
dataset_mixer:
Magpie-Align/MagpieLM-DPO-Data-v0.1: 1.0
dataset_splits:
- train
- test
preprocessing_num_workers: 24
# DPOTrainer arguments
bf16: true
beta: 0.01
learning_rate: 1.5e-7
gradient_accumulation_steps: 16
per_device_train_batch_size: 2
per_device_eval_batch_size: 4
num_train_epochs: 1
max_length: 2048
max_prompt_length: 1800
warmup_ratio: 0.1
logging_steps: 1
lr_scheduler_type: cosine
optim: adamw_torch
torch_dtype: null
# use_flash_attention_2: true
do_eval: true
evaluation_strategy: steps
eval_steps: 100
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: False
log_level: info
push_to_hub: true
save_total_limit: 0
seed: 42
report_to:
- wandb
📚 Citation
If you find the model, data, or code useful, please cite:
@article{xu2024magpie,
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Contact
Questions? Contact:
- Zhangchen Xu [zxu9 at uw dot edu], and
- Bill Yuchen Lin [yuchenlin1995 at gmail dot com]
- Downloads last month
- 871
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
Model tree for QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF
Base model
nvidia/Llama-3.1-Minitron-4B-Width-Base
