Fixed Chat Templates for Qwen 3.5 & 3.6
Drop-in Jinja templates that fix rendering errors, token waste, and missing features in the official Qwen chat templates. Works in LM Studio, llama.cpp, vLLM, MLX, oMLX, and any engine that supports HuggingFace Jinja templates.
Why you need this
The official Qwen templates have bugs that break real usage:
| Problem | Impact |
|---|---|
| Tool calls fail on C++ engines | ` |
developer role rejected |
Modern APIs send it; the official template raises an error |
| Empty thinking blocks spam context | Every past turn gets wrapped in tags, even with nothing inside |
| No way to toggle thinking | You're stuck with whatever the model defaults to |
Qwen 3.6: </thinking> hallucination |
Model sometimes generates the wrong closing tag; parser fails |
All five are fixed here, plus a clean <|think_on|> / <|think_off|> toggle you can drop into any message.
Quick install
LM Studio
- Open your Qwen model in the right-side panel
- Scroll to Prompt Template
- Replace the template with the contents of
qwen3.5/chat_template.jinjaorqwen3.6/chat_template.jinja - Save
llama.cpp / koboldcpp
--jinja --chat-templateFile qwen3.6/chat_template.jinja
vLLM / TextGen
Replace the chat_template string in your tokenizer_config.json with the file contents.
oMLX
Overwrite chat_template.jinja in your local model directory. Load with --jinja. Remove any chat_template_kwargs overrides β the template handles everything internally.
Which file do I use?
| File | For models |
|---|---|
qwen3.5/chat_template.jinja |
Qwen3.5-35B-A3B, Qwen3.5-32B, Qwen3.5-14B, and all Qwen 3.5 variants |
qwen3.6/chat_template.jinja |
Qwen3.6-27B, Qwen3.6-35B-A3B, and all Qwen 3.6 variants |
The 3.6 template is a superset β it additionally handles preserve_thinking, </thinking> hallucination recovery, and interrupted thought streams. If you're on 3.6, use the 3.6 file.
Thinking toggle
Drop <|think_on|> or <|think_off|> anywhere in your system or user prompt. The template intercepts the tag, removes it from context so the model never sees it, and flips the mode.
Fast answer, no reasoning:
System: You are a coding assistant. <|think_off|>
User: What's 2+2?
Deep reasoning:
System: You are a coding assistant. <|think_on|>
User: Implement a red-black tree in Rust.
The tag syntax (<|think_on|>, <|think_off|>) uses Qwen's control-token delimiters, so it will never collide with real text. Earlier community templates used /think, which broke legitimate paths like cd /mnt/project/think.
Pre-installed models
These templates are already bundled with:
- froggeric/Qwen3.6-27B-MLX-8bit
- froggeric/Qwen3.6-27B-MLX-4bit
- froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-8bit
- froggeric/Qwen3.5-35B-A3B-Uncensored-FernflowerAI-MLX-4bit
If you're using one of those, you already have the template. This repo is for everyone else.
Technical details β what exactly was fixed
Tool calls on C++ engines
The official template iterates tool call arguments with |items:
{%- for key, value in tool_call.arguments|items %}
Python's Jinja supports |items. C++ runtimes (LM Studio, llama.cpp, MLX) do not β the template produces a rendering error instead of output. This template uses direct dictionary key lookups instead:
{%- for args_name in tool_call.arguments %}
{%- set args_value = tool_call.arguments[args_name] %}
It also replaces is sequence with is iterable (stricter C++ runtimes require it), removes |safe wrappers (also Python-only), and handles arguments returned as raw strings instead of objects.
developer role
The OpenAI-compatible API spec sends message.role == "developer" for system-level instructions. The official Qwen template only checks for "system" and throws on anything else. Both templates here accept "developer" and map it to the system role.
Empty thinking blocks
The official template wraps every past assistant turn in thinking tags:
<|im_start|>assistant
<think/>
</think >
Here is the answer...
When there's no reasoning content, those tags are dead weight β they waste context tokens and break prefix caching. The Qwen 3.5 template checks reasoning_content before emitting. The Qwen 3.6 template goes further: it respects the preserve_thinking kwarg, checks reasoning_content|trim|length > 0, and ties history visibility to the <|think_off|> override.
</thinking> hallucination (Qwen 3.6 only)
The Qwen 3.6 model sometimes generates </thinking> instead of the expected closing tag. The official parser splits on </think > only and fails. The 3.6 template detects which closing tag was actually used and splits on that:
{%- if '</think >' in content %}
{%- set think_end_token = '</think >' %}
{%- elif '</thinking>' in content %}
{%- set think_end_token = '</thinking>' %}
It also handles interrupted generation (max tokens hit mid-thought) by rescuing incomplete streams instead of injecting broken tag pairs.
Arguments serialization
The official template serializes argument values with |tojson unconditionally, which turns Python True into JSON true correctly but fails when the value is already a string. The fixed templates check the type first β strings pass through as-is, everything else goes through |tojson.
Comparison β Qwen 3.5 templates
| Feature | Official | LuffyTheFox | mod-ellary | Pneuny | This |
|---|---|---|---|---|---|
| Tool arguments | Fails | Fixed | Missing | Fixed | Fixed |
|safe removed |
Fails | Fixed | Missing | Fixed | Fixed |
developer role |
Missing | Missing | Missing | Missing | Added |
| Thinking toggle | None | None | /think (system only) |
None | <|think_off|> anywhere |
| Empty think in history | Broken | Broken | Tags omitted | Broken | Fixed |
| Text safety | N/A | N/A | Breaks on /think in paths |
N/A | Safe |
| Clean instructions | Yes | Yes | Yes | Injects "I cannot call a tool" | Yes |
Comparison β Qwen 3.6 template
| Feature | Official | This |
|---|---|---|
| Tool arguments | Fails (|items) |
Fixed |
|safe removed |
Fails | Fixed |
developer role |
Missing | Added |
| Thinking toggle | None | <|think_off|> anywhere |
preserve_thinking |
Spams empty blocks | Dynamic length checks |
</thinking> hallucination |
Fails | Detected and handled |
| Interrupted streams | Broken tags | Rescued |
Authorship
| Role | Author |
|---|---|
| Original models | Alibaba Cloud (Qwen team) |
| Template fixes | froggeric |
License
Apache-2.0, inherited from Qwen.