Instructions to use zlab-princeton/Vero-Qwen3T-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zlab-princeton/Vero-Qwen3T-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="zlab-princeton/Vero-Qwen3T-8B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("zlab-princeton/Vero-Qwen3T-8B")
model = AutoModelForImageTextToText.from_pretrained("zlab-princeton/Vero-Qwen3T-8B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use zlab-princeton/Vero-Qwen3T-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zlab-princeton/Vero-Qwen3T-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zlab-princeton/Vero-Qwen3T-8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/zlab-princeton/Vero-Qwen3T-8B

SGLang

How to use zlab-princeton/Vero-Qwen3T-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "zlab-princeton/Vero-Qwen3T-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zlab-princeton/Vero-Qwen3T-8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "zlab-princeton/Vero-Qwen3T-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zlab-princeton/Vero-Qwen3T-8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use zlab-princeton/Vero-Qwen3T-8B with Docker Model Runner:
```
docker model run hf.co/zlab-princeton/Vero-Qwen3T-8B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Vero-Qwen3T-8B

Vero is an open RL model family for general visual reasoning. It releases models, data, evaluation, and training code for broad multimodal reasoning across charts, STEM, spatial reasoning, knowledge, grounding, counting, and instruction following.

Models

Model	HF repo	Base model	Params
`Vero-Qwen3I-8B`	`gsarch/Vero-Qwen3I-8B`	`Qwen3-VL-8B-Instruct`	8B
`Vero-Qwen3T-8B`	`gsarch/Vero-Qwen3T-8B`	`Qwen3-VL-8B-Thinking`	8B
`Vero-MiMo-7B`	`gsarch/Vero-MiMo-7B`	`MiMo-VL-7B-SFT-2508`	7B
`Vero-Qwen25-7B`	`gsarch/Vero-Qwen25-7B`	`Qwen2.5-VL-7B-Instruct`	7B

Highlights

Fully open release of models, training code, evaluation, and the Vero-600K dataset.
600K curated RL samples from 59 datasets across 6 visual reasoning categories.
Trained for broad transfer across chart and OCR, STEM, spatial and action, knowledge and recognition, grounding and counting, and captioning and instruction following.
SOTA 8B on VeroEval, a 30-benchmark suite for general visual reasoning.
Improves performance across multiple base model families, including Qwen2.5-VL, Qwen3-VL, and MiMo-VL.

Usage

Example for gsarch/Vero-Qwen3T-8B:

from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from qwen_vl_utils import process_vision_info

model_path = "gsarch/Vero-Qwen3T-8B"

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_path,
    torch_dtype="auto",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_path)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "path/to/image.jpg"},
            {"type": "text", "text": "What is the x axis value with the largest population?"},
        ],
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
).to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=2048)
output = processor.batch_decode(
    generated_ids[:, inputs.input_ids.shape[1]:],
    skip_special_tokens=True,
)[0]
print(output)

Vero models generate a reasoning trace in <think> tags followed by a final answer in <answer> tags. For downstream use, parse the final response from <answer>.

Recommended sampling parameters, following the Qwen3.5 defaults:

Thinking mode for general tasks: temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0, max_new_tokens=16384.

Citation

@article{sarch2026vero,
    title   = {Vero: An Open RL Recipe for General Visual Reasoning},
    author  = {Sarch, Gabriel and Cai, Linrong and Wang, Qunzhong and Wu, Haoyang and Chen, Danqi and Liu, Zhuang},
    year    = {2026},
    journal = {arXiv preprint arXiv:2604.04917},
}

License

Vero is released under the Apache-2.0 license. Users should also review the licenses and usage terms of the underlying base models and any upstream datasets.