Instructions to use NexaAI/qwen3vl-8B-Thinking-fp16-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use NexaAI/qwen3vl-8B-Thinking-fp16-mlx with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("NexaAI/qwen3vl-8B-Thinking-fp16-mlx") config = load_config("NexaAI/qwen3vl-8B-Thinking-fp16-mlx") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use NexaAI/qwen3vl-8B-Thinking-fp16-mlx with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "NexaAI/qwen3vl-8B-Thinking-fp16-mlx"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "NexaAI/qwen3vl-8B-Thinking-fp16-mlx" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use NexaAI/qwen3vl-8B-Thinking-fp16-mlx with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "NexaAI/qwen3vl-8B-Thinking-fp16-mlx"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default NexaAI/qwen3vl-8B-Thinking-fp16-mlx
Run Hermes
hermes
Configuration Parsing Warning:Invalid JSON for config file config.json
Qwen3-VL-8B-Thinking
Run Qwen3-VL-8B-Thinking optimized for Apple Silicon on MLX with NexaSDK.
Quickstart
Install NexaSDK
Run the model locally with one line of code:
nexa infer NexaAI/qwen3vl-8B-Thinking-fp16-mlx
Model Description
Qwen3-VL-8B-Thinking is an 8-billion-parameter multimodal large language model from Alibaba Cloud’s Qwen team.
As part of the Qwen3-VL (Vision-Language) family, it is designed for deep multimodal reasoning — combining visual understanding, long-context comprehension, and structured chain-of-thought generation across text, images, and videos.
The Thinking variant focuses on advanced reasoning transparency and analytical precision. Compared to the Instruct version, it produces richer intermediate reasoning steps, enabling detailed explanation, planning, and multi-hop analysis across visual and textual inputs.
Features
- Deep Visual Reasoning: Interprets complex scenes, charts, and documents with multi-step logic.
- Chain-of-Thought Generation: Produces structured reasoning traces for improved interpretability and insight.
- Extended Context Handling: Maintains coherence across longer multimodal sequences.
- Multilingual Competence: Understands and generates in multiple languages for global applicability.
- High Accuracy at 8B Scale: Achieves strong benchmark performance in multimodal reasoning and analysis tasks.
Use Cases
- Research and analysis requiring visual reasoning transparency
- Complex multimodal QA and scientific problem solving
- Visual analytics and explanation generation
- Advanced agent systems needing structured thought or planning steps
- Educational tools requiring detailed, interpretable reasoning
Inputs and Outputs
Input:
- Text, image(s), or multimodal combinations (including sequential frames or documents)
- Optional context for multi-turn or multi-modal reasoning
Output:
- Structured reasoning outputs with intermediate steps
- Detailed answers, explanations, or JSON-formatted reasoning traces
License
Refer to the official Qwen license for usage and redistribution details.
- Downloads last month
- 9
Quantized
Model tree for NexaAI/qwen3vl-8B-Thinking-fp16-mlx
Base model
Qwen/Qwen3-VL-8B-Instruct