Instructions to use EnlistedGhost/Devstral-Small-2507-Vision-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EnlistedGhost/Devstral-Small-2507-Vision-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="EnlistedGhost/Devstral-Small-2507-Vision-GGUF")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("EnlistedGhost/Devstral-Small-2507-Vision-GGUF", dtype="auto")

llama-cpp-python

How to use EnlistedGhost/Devstral-Small-2507-Vision-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="EnlistedGhost/Devstral-Small-2507-Vision-GGUF",
	filename="Devstral-Small-24B-2507-Vision-BF16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use EnlistedGhost/Devstral-Small-2507-Vision-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EnlistedGhost/Devstral-Small-2507-Vision-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf EnlistedGhost/Devstral-Small-2507-Vision-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EnlistedGhost/Devstral-Small-2507-Vision-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf EnlistedGhost/Devstral-Small-2507-Vision-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf EnlistedGhost/Devstral-Small-2507-Vision-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf EnlistedGhost/Devstral-Small-2507-Vision-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf EnlistedGhost/Devstral-Small-2507-Vision-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf EnlistedGhost/Devstral-Small-2507-Vision-GGUF:Q4_K_M

Use Docker

docker model run hf.co/EnlistedGhost/Devstral-Small-2507-Vision-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use EnlistedGhost/Devstral-Small-2507-Vision-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EnlistedGhost/Devstral-Small-2507-Vision-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EnlistedGhost/Devstral-Small-2507-Vision-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/EnlistedGhost/Devstral-Small-2507-Vision-GGUF:Q4_K_M

SGLang

How to use EnlistedGhost/Devstral-Small-2507-Vision-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EnlistedGhost/Devstral-Small-2507-Vision-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EnlistedGhost/Devstral-Small-2507-Vision-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EnlistedGhost/Devstral-Small-2507-Vision-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EnlistedGhost/Devstral-Small-2507-Vision-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use EnlistedGhost/Devstral-Small-2507-Vision-GGUF with Ollama:
```
ollama run hf.co/EnlistedGhost/Devstral-Small-2507-Vision-GGUF:Q4_K_M
```

Unsloth Studio new

How to use EnlistedGhost/Devstral-Small-2507-Vision-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EnlistedGhost/Devstral-Small-2507-Vision-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EnlistedGhost/Devstral-Small-2507-Vision-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EnlistedGhost/Devstral-Small-2507-Vision-GGUF to start chatting

Docker Model Runner
How to use EnlistedGhost/Devstral-Small-2507-Vision-GGUF with Docker Model Runner:
```
docker model run hf.co/EnlistedGhost/Devstral-Small-2507-Vision-GGUF:Q4_K_M
```

Lemonade

How to use EnlistedGhost/Devstral-Small-2507-Vision-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull EnlistedGhost/Devstral-Small-2507-Vision-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Devstral-Small-2507-Vision-GGUF-Q4_K_M

List all available models

lemonade list

-----------------------------------------------
- Model Details and Specifications: -
-----------------------------------------------

Devstral Small 2507 Vision GGUF (Ollama & Llama.cpp)

This release contains:
Llama.cpp and Ollama compatible GGUF converted and Quantized model files (Compatible with both Ollama, and Llama.cpp)
(More information and an updates to the ModelCard (this page) coming soon!)

Quantized GGUF version of:

EnlistedGhost/Devstral-Small-2507-Vision
(by EnlistedGhost)

Original Model Link:

EnlistedGhost/Devstral-Small-2507-Vision

---------------------------------------------------
- Conversion and GGUF Quantization: -
---------------------------------------------------

Software used to convert Safetensors to GGUF:

llama.cpp

Software used to create Quantized GGUF Files:

llama.cpp

Specific GitHub Commit Point:

b7266

Converted to GGUF and Quantized by:

EnlistedGhost

-------------------------------
---- Updates & News ----
-------------------------------

Model Updates (as of: December 28th, 2025)

Uploaded: All remaining GGUF Converted and Quantized model files (Q5_K, Q6_K, Q8_0)
Updated: ModelCard
(this page)

--------------------------------------
---- How to run this Model ----
--------------------------------------

Compatible Software (Required to use this Model)
You can run this model by using either Ollama (or) Llama.cpp
(Below are instruction on running these GGUF files with Ollama)

How to run this Model using Ollama
You can run this model by using the "ollama run" command.
Simply copy & paste one of the commands from the list below into
your console, terminal or power-shell window.

Quant Type	File Size	Command
QX_X	0.00 GB	Run/Pull Command (Coming Soon)

Vision Projector (Files)
mmproj (Vision Projector) Files

Quant Type	File Size	Download Link
Q8_0	465 MB
F16	870 MB
F32	1.74 GB

---------------------------
---- Original Info ----
---------------------------

(Crossposted from the link in the above section: "Model Details"):

Devstral Small 1.1

Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positions it as the #1 open source model on this benchmark.

It is finetuned from Mistral-Small-3.1, therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from Mistral-Small-3.1 the vision encoder was removed.

For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.

Learn more about Devstral in our blog post.

Updates compared to Devstral Small 1.0:

Improved performance, please refer to the benchmark results.
Devstral Small 1.1 is still great when paired with OpenHands. This new version also generalizes better to other prompts and coding environments.
Supports Mistral's function calling format.

Key Features:

Agentic coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents.
lightweight: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use.
Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes.
Context Window: A 128k context window.
Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size.

Benchmark Results

SWE-Bench

Devstral Small 1.1 achieves a score of 53.6% on SWE-Bench Verified, outperforming Devstral Small 1.0 by +6,8% and the second best state of the art model by +11.4%.

Model	Agentic Scaffold	SWE-Bench Verified (%)
Devstral Small 1.1	OpenHands Scaffold	53.6
Devstral Small 1.0	OpenHands Scaffold	46.8
GPT-4.1-mini	OpenAI Scaffold	23.6
Claude 3.5 Haiku	Anthropic Scaffold	40.6
SWE-smith-LM 32B	SWE-agent Scaffold	40.2
Skywork SWE	OpenHands Scaffold	38.0
DeepSWE	R2E-Gym Scaffold	42.2

When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 and Qwen3 232B-A22B.

Usage

We recommend to use Devstral with the OpenHands scaffold. You can use it either through our API or by running locally.

Downloads last month: 302

GGUF

Model size

24B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for EnlistedGhost/Devstral-Small-2507-Vision-GGUF

Base model

mistralai/Mistral-Small-3.1-24B-Base-2503

Finetuned

mistralai/Mistral-Small-3.1-24B-Instruct-2503

Finetuned

EnlistedGhost/Devstral-Small-2507-Vision

Quantized

(3)

this model

EnlistedGhost
/

Devstral-Small-2507-Vision-GGUF

-----------------------------------------------
- Model Details and Specifications: -
-----------------------------------------------

Devstral Small 2507 Vision GGUF (Ollama & Llama.cpp)

---------------------------------------------------
- Conversion and GGUF Quantization: -
---------------------------------------------------

-------------------------------
---- Updates & News ----
-------------------------------

--------------------------------------
---- How to run this Model ----
--------------------------------------

---------------------------
---- Original Info ----
---------------------------

Devstral Small 1.1

Key Features:

Benchmark Results

SWE-Bench

Usage

Model tree for EnlistedGhost/Devstral-Small-2507-Vision-GGUF

Dataset used to train EnlistedGhost/Devstral-Small-2507-Vision-GGUF

----------------------------------------------- - Model Details and Specifications: ------------------------------------------------

Devstral Small 2507 Vision GGUF (Ollama & Llama.cpp)

--------------------------------------------------- - Conversion and GGUF Quantization: ----------------------------------------------------

------------------------------- ---- Updates & News ---- -------------------------------

-------------------------------------- ---- How to run this Model ---- --------------------------------------

--------------------------- ---- Original Info ---- ---------------------------

Devstral Small 1.1

Key Features:

Benchmark Results

SWE-Bench

Usage

Model tree for EnlistedGhost/Devstral-Small-2507-Vision-GGUF

Dataset used to train EnlistedGhost/Devstral-Small-2507-Vision-GGUF

-----------------------------------------------
- Model Details and Specifications: -
-----------------------------------------------

---------------------------------------------------
- Conversion and GGUF Quantization: -
---------------------------------------------------

-------------------------------
---- Updates & News ----
-------------------------------

--------------------------------------
---- How to run this Model ----
--------------------------------------

---------------------------
---- Original Info ----
---------------------------