Zen Vl 30B Agent
Zen VL 30B Agent - Frontier vision-language model with function calling (31B MoE)
Model Details
- Architecture: Zen
- Parameters: 30B
- Context Window: 256K tokens (expandable to 1M)
- License: Apache 2.0
- Training: Fine-tuned with Zen identity and function calling
Capabilities
- 🎨 Visual Understanding: Image analysis, video comprehension, spatial reasoning
- 📝 OCR: Text extraction in 32 languages
- 🧠 Multimodal Reasoning: STEM, math, code generation
- 🛠️ Function Calling: Tool use with visual context
- 🤖 Visual Agents: GUI interaction, parameter extraction
Usage
from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image
# Load model
model = AutoModelForVision2Seq.from_pretrained(
"zenlm/zen-vl-30b-agent",
device_map="auto"
)
processor = AutoProcessor.from_pretrained("zenlm/zen-vl-30b-agent")
# Process image
image = Image.open("example.jpg")
prompt = "What's in this image?"
messages = [{"role": "user", "content": prompt}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)
# Generate
outputs = model.generate(**inputs, max_new_tokens=256)
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)
Links
- 🌐 Website: zenlm.org
- 📚 GitHub: zenlm/zen-vl
- 📄 Paper: Coming soon
- 🤗 Model Family: zenlm
Citation
@misc{zenvl2025,
title={Zen VL: Vision-Language Models with Integrated Function Calling},
author={Hanzo AI Team},
year={2025},
publisher={Zen Language Models},
url={https://github.com/zenlm/zen-vl}
}
License
Apache 2.0
Created by Hanzo AI for the Zen model family.
- Downloads last month
- 40
Model tree for zenlm/zen-vl-30b-agent
Unable to build the model tree, the base model loops to the model itself. Learn more.