Instructions to use arcee-ai/Arcee-Blitz-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use arcee-ai/Arcee-Blitz-GGUF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("arcee-ai/Arcee-Blitz-GGUF", dtype="auto")

llama-cpp-python

How to use arcee-ai/Arcee-Blitz-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="arcee-ai/Arcee-Blitz-GGUF",
	filename="Arcee-Blitz-IQ2_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use arcee-ai/Arcee-Blitz-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf arcee-ai/Arcee-Blitz-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf arcee-ai/Arcee-Blitz-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf arcee-ai/Arcee-Blitz-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf arcee-ai/Arcee-Blitz-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf arcee-ai/Arcee-Blitz-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf arcee-ai/Arcee-Blitz-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf arcee-ai/Arcee-Blitz-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf arcee-ai/Arcee-Blitz-GGUF:Q4_K_M

Use Docker

docker model run hf.co/arcee-ai/Arcee-Blitz-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use arcee-ai/Arcee-Blitz-GGUF with Ollama:
```
ollama run hf.co/arcee-ai/Arcee-Blitz-GGUF:Q4_K_M
```

Unsloth Studio new

How to use arcee-ai/Arcee-Blitz-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for arcee-ai/Arcee-Blitz-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for arcee-ai/Arcee-Blitz-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for arcee-ai/Arcee-Blitz-GGUF to start chatting

Docker Model Runner
How to use arcee-ai/Arcee-Blitz-GGUF with Docker Model Runner:
```
docker model run hf.co/arcee-ai/Arcee-Blitz-GGUF:Q4_K_M
```

Lemonade

How to use arcee-ai/Arcee-Blitz-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull arcee-ai/Arcee-Blitz-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Arcee-Blitz-GGUF-Q4_K_M

List all available models

lemonade list

GGUF Quantizations for Arcee-Blitz

Arcee-Blitz (24B) is a new Mistral-based 24B model distilled from DeepSeek, designed to be both fast and efficient. We view it as a practical “workhorse” model that can tackle a range of tasks without the overhead of larger architectures.

Model Details

Architecture Base: Mistral-Small-24B-Instruct-2501
Parameter Count: 24B
Distillation Data:
- Merged Virtuoso pipeline with Mistral architecture, hotstarting the training with over 3B tokens of pretraining distillation from DeepSeek-V3 logits
Fine-Tuning and Post-Training:
- After capturing core logits, we performed additional fine-tuning and distillation steps to enhance overall performance.
License: Apache-2.0

Improving World Knowledge

Arcee-Blitz shows large improvements to performance on MMLU-Pro versus the original Mistral-Small-3, reflecting a dramatic increase in world knowledge.

Data contamination checking

We carefully examined our training data and pipeline to avoid contamination. While we’re confident in the validity of these gains, we remain open to further community validation and testing (one of the key reasons we release these models as open-source).

Benchmark Comparison

Benchmark	mistral‑small‑3	arcee‑blitz
MixEval	81.6%	85.1%
GPQADiamond	42.4%	43.1%
BigCodeBench Complete	44.4%	45.5%
BigCodeBench Instruct	34.7%	35.9%
BigCodeBench Complete-hard	16.2%	19.6%
BigCodeBench Instruct-hard	15.5%	15.5%
IFEval	77.44	80.60
BBH	64.46	65.00
GPQA	33.90	36.70
MMLU Pro	44.70	60.20
MuSR	40.90	50.00
Math Level 5	12.00	38.60

Limitations

Context Length: 32k Tokens (may vary depending on the final tokenizer settings and system resources).
Knowledge Cut-off: Training data may not reflect the latest events or developments beyond June 2024.

Ethical Considerations

Content Generation Risks: Like any language model, Arcee-Blitz can generate potentially harmful or biased content if prompted in certain ways.

License

Arcee-Blitz (24B) is released under the Apache-2.0 License. You are free to use, modify, and distribute this model in both commercial and non-commercial applications, subject to the terms and conditions of the license.

If you have questions or would like to share your experiences using Arcee-Blitz (24B), please connect with us on social media. We’re excited to see what you build—and how this model helps you innovate!

Downloads last month: 136

GGUF

Model size

24B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for arcee-ai/Arcee-Blitz-GGUF

Base model

mistralai/Mistral-Small-24B-Base-2501

Finetuned

mistralai/Mistral-Small-24B-Instruct-2501

Finetuned

arcee-ai/Arcee-Blitz

Quantized

(21)

this model