Instructions to use nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw") model = AutoModelForImageTextToText.from_pretrained("nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw
- SGLang
How to use nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw with Docker Model Runner:
docker model run hf.co/nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw
Razorback 12B v0.2 ExLlamaV2 4.0bpw Quant
UnslopNemo with Vision!

A more robust attempt at merging TheDrummer's UnslopNemo v3 into Pixtral 12B.
Has been really stable in my testing so far. Needs more testing to see what samplers it does/doesn't like.
Seems to be the best of both worlds - less sloppy, more engaging content and decent intelligence / visual understanding.
Merging Approach
First, I loaded up Pixtral 12B Base and Mistral Nemo Base to compare their parameter differences. Looking at the L2 norm / relative difference values I was able to isolate which parts of Pixtral 12B are a significant deviation from Mistral Nemo. Because while the language model architecture is the same between the two, a lot of vision understanding has been trained into Pixtral's language model and can break very easily.
Then I calculated merging weights for each parameter using an exponential falloff. The smaller the difference, the higher the weight.
Applied this recipe to Pixtral Instruct (Pixtral-12B-2409) and TheDrummer's UnslopNemo-12B-v3. The goal is to infuse as much Drummer goodness without breaking vision input. And it looks like it's worked!
Usage
Needs more testing to identify best sampling params, but so far just using ~0.7 temp + 0.03 min p has been rock solid.
Use the included chat template (Mistral). No chatml support yet.
Credits
- Mistral for mistralai/Pixtral-12B-2409
- Unsloth for unsloth/Pixtral-12B-2409 transformers conversion
- TheDrummer for TheDrummer/UnslopNemo-12B-v3
Available Sizes
| Repo | Bits | Head Bits | Size |
|---|---|---|---|
| nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw | 4.0 | 6.0 | 8.19 GB |
| nintwentydo/Razorback-12B-v0.2-exl2-5.0bpw | 5.0 | 6.0 | 9.54 GB |
| nintwentydo/Razorback-12B-v0.2-exl2-6.0bpw | 6.0 | 8.0 | 11.1 GB |
| nintwentydo/Razorback-12B-v0.2-exl2-8.0bpw | 8.0 | 8.0 | 13.7 GB |
- Downloads last month
- 5
Model tree for nintwentydo/Razorback-12B-v0.2-exl2-4.0bpw
Base model
nintwentydo/Razorback-12B-v0.2