Cross-layer transcoder for Qwen3-0.6B-Base

Blog | Technical Report | Feature Dashboard | BluelightAI

This is a cross-layer transcoder trained to interpret the activations of Qwen3-0.6B-Base. It can be used with the open source circuit-tracer library to build and interrogate attribution graphs for prompts. You can also explore the features on our dashboard.

What is a cross-layer transcoder?

A cross-layer transcoder is an interpreter model trained to extract sparsely-activating features from the activations of a Transformer model. Its encoder translates the input $x^{\text{in}}_{\ell}$ of each MLP layer of the Transformer into a high-dimensional but sparse feature vector $f_\ell$ . The decoder then reconstructs the output $x^{\text{out}}_{\ell}$ of the MLP using features that were extracted from all previous layers. In formulas:

$f_\ell = \sigma(W^{\text{enc}}_\ell x^{\text{in}}_{\ell} + b^{\text{enc}}_\ell)$ $\hat{x}^{\text{out}}_\ell = \sum_{k \leq \ell} W^{\text{dec}}_{k \to \ell} f_k + b^{\text{dec}}_\ell$

where $\sigma$ is a sparsity-encouraging activation function.

The model is trained with a reconstruction loss alongside an auxiliary loss to encourage sparsity, roughly:

$\mathcal L(x^{\text{in}}, x^{\text{out}}) = \|\hat{x}^{\text{out}} - x^{\text{out}}\|_2^2 + \lambda \|\tanh(\alpha f)\|_1$

See our technical report for more details.

Model Details

This model is a cross-layer transcoder with 20480 features per layer (an expansion factor of 20x), using a JumpReLU activation function. It attains an L0 sparsity across layers of approximately 115, with about 23% of variance in MLP outputs unexplained.

The model was trained on approximately 750 million tokens of text from a broad range of domains, including general web text, public domain books, scientific articles, and code. Source datasets include:

Usage

This CLT can be used with the circuit-tracer library to generate attribution graphs.

Note: To load the CLT in circuit-tracer with the Qwen3-0.6B-Base model, you will need to install a patched version of Transformer Lens, as the current release does not support the Qwen3-Base models. Such a patched version is available here. The CLT can be used with an unpatched Transformer Lens if you use the Qwen/Qwen3-0.6B model instead.

You can load the CLT in circuit-tracer as follows:

import torch
from circuit_tracer import ReplacementModel
model_name = "Qwen/Qwen3-0.6B-Base" # Or just "Qwen/Qwen3-0.6B"
transcoder_name = "bluelightai/clt-qwen3-0.6b-base-20k"
clt = ReplacementModel.from_pretrained(model_name, transcoder_name, dtype=torch.bfloat16)

See this Colab notebook for a complete example.

Weight format

The model weights are sharded by layer across multiple files. The W_enc_{layer}.safetensors file contains the following named tensors:

W_enc_{layer}: encoder weights, shape (d_latent, d_model)
b_enc_{layer}: encoder biases, shape d_latent,
threshold_{layer}: JumpReLU thresholds, shape d_latent
b_dec_{layer}: decoder biases, shape d_model.

The W_dec_{layer}.safetensors file has a single tensor named W_dec_{layer}, containing the weights decoding that layer's features to the outputs of all subsequent layers, with shape (d_latent, n_out_layers, d_model).

Contact

You can reach us at [email protected] with any questions or inspiration.

Downloads last month: 21

Model tree for bluelightai/clt-qwen3-0.6b-base-20k

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

(493)

this model

Collection including bluelightai/clt-qwen3-0.6b-base-20k

Qwen3 Cross-layer Transcoders

Collection

Cross-layer transcoders for models from the Qwen3 family. • 2 items • Updated Dec 1, 2025 • 1