Cross-layer transcoder for Qwen3-0.6B-Base
Blog | Technical Report | Feature Dashboard | BluelightAI
This is a cross-layer transcoder trained to interpret the activations of Qwen3-0.6B-Base. It can be used with the open source circuit-tracer library to build and interrogate attribution graphs for prompts. You can also explore the features on our dashboard.
What is a cross-layer transcoder?
A cross-layer transcoder is an interpreter model trained to extract sparsely-activating features from the activations of a Transformer model. Its encoder translates the input of each MLP layer of the Transformer into a high-dimensional but sparse feature vector . The decoder then reconstructs the output of the MLP using features that were extracted from all previous layers. In formulas:
where is a sparsity-encouraging activation function.
The model is trained with a reconstruction loss alongside an auxiliary loss to encourage sparsity, roughly:
See our technical report for more details.
Model Details
This model is a cross-layer transcoder with 20480 features per layer (an expansion factor of 20x), using a JumpReLU activation function. It attains an L0 sparsity across layers of approximately 115, with about 23% of variance in MLP outputs unexplained.
The model was trained on approximately 750 million tokens of text from a broad range of domains, including general web text, public domain books, scientific articles, and code. Source datasets include:
Usage
This CLT can be used with the circuit-tracer library to generate attribution graphs.
Note: To load the CLT in circuit-tracer with the Qwen3-0.6B-Base model, you will need to install a patched version of Transformer Lens, as the current release does not support the Qwen3-Base models. Such a patched version is available here. The CLT can be used with an unpatched Transformer Lens if you use the Qwen/Qwen3-0.6B model instead.
You can load the CLT in circuit-tracer as follows:
import torch
from circuit_tracer import ReplacementModel
model_name = "Qwen/Qwen3-0.6B-Base" # Or just "Qwen/Qwen3-0.6B"
transcoder_name = "bluelightai/clt-qwen3-0.6b-base-20k"
clt = ReplacementModel.from_pretrained(model_name, transcoder_name, dtype=torch.bfloat16)
See this Colab notebook for a complete example.
Weight format
The model weights are sharded by layer across multiple files. The W_enc_{layer}.safetensors file contains the following named tensors:
W_enc_{layer}: encoder weights, shape(d_latent, d_model)b_enc_{layer}: encoder biases, shaped_latent,threshold_{layer}: JumpReLU thresholds, shaped_latentb_dec_{layer}: decoder biases, shaped_model.
The W_dec_{layer}.safetensors file has a single tensor named W_dec_{layer}, containing the weights decoding that layer's features to the outputs of all subsequent layers, with shape (d_latent, n_out_layers, d_model).
Contact
You can reach us at [email protected] with any questions or inspiration.
- Downloads last month
- 21
Model tree for bluelightai/clt-qwen3-0.6b-base-20k
Base model
Qwen/Qwen3-0.6B-Base