Z-Image-Fun
Collection
2 items
β’
Updated
β’
3
control_layers was used instead of control_noise_refiner to process refiner latents during training. Although the model converged normally, the model inference speed was slow because control_layers forward pass was performed twice. In version 2.1, we made an urgent fix and the speed has returned to normal. [2025.12.17]| Name | Description |
|---|---|
| Z-Image-Turbo-Fun-Controlnet-Union-2.1-8steps.safetensors | Based on version 2.1, the model was distilled using an 8-step distillation algorithm. 8-step prediction is recommended. Compared to version 2.1, when using 8-step prediction, the images are clearer and the composition is more reasonable. |
| Z-Image-Turbo-Fun-Controlnet-Tile-2.1-8steps.safetensors | A Tile model trained on high-definition datasets that can be used for super-resolution, with a maximum training resolution of 2048x2048. The model was distilled using an 8-step distillation algorithm, and 8-step prediction is recommended. |
| Z-Image-Turbo-Fun-Controlnet-Union-2.1.safetensors | A retrained model after fixing the typo in version 2.0, with faster single-step speed. Similar to version 2.0, the model lost some of its acceleration capability after training, thus requiring more steps. |
| Z-Image-Turbo-Fun-Controlnet-Union-2.0.safetensors | ControlNet weights for Z-Image-Turbo. Compared to version 1.0, it adds modifications to more layers and was trained for a longer time. However, due to a typo in the code, the layer blocks were forwarded twice, resulting in slower speed. The model supports multiple control conditions such as Canny, Depth, Pose, MLSD, etc. Additionally, the model lost some of its acceleration capability after training, thus requiring more steps. |
8 steps results:
| Z-Image-Turbo-Fun-Controlnet-Union-2.1-8steps | Z-Image-Turbo-Fun-Controlnet-Union-2.1 |
![]() |
![]() |
| Pose + Inpaint | Output |
![]() ![]() ![]() |
![]() |
| Pose | Output |
![]() |
![]() |
| Pose | Output |
![]() |
![]() |
| Pose | Output |
![]() |
![]() |
| Canny | Output |
![]() |
![]() |
| HED | Output |
![]() |
![]() |
| Depth | Output |
![]() |
![]() |
| Low Resolution | High Resolution |
![]() |
![]() |
Go to the VideoX-Fun repository for more details.
Please clone the VideoX-Fun repository and create the required directories:
# Clone the code
git clone https://github.com/aigc-apps/VideoX-Fun.git
# Enter VideoX-Fun's directory
cd VideoX-Fun
# Create model directories
mkdir -p models/Diffusion_Transformer
mkdir -p models/Personalized_Model
Then download the weights into models/Diffusion_Transformer and models/Personalized_Model.
π¦ models/
βββ π Diffusion_Transformer/
β βββ π Z-Image-Turbo/
βββ π Personalized_Model/
β βββ π¦ Z-Image-Turbo-Fun-Controlnet-Union-2.1.safetensors
β βββ π¦ Z-Image-Turbo-Fun-Controlnet-Union-2.1-8steps.safetensors
β βββ π¦ Z-Image-Turbo-Fun-Controlnet-Tile-2.1-8steps.safetensors
Then run the file examples/z_image_fun/predict_t2i_control_2.1.py and examples/z_image_fun/predict_i2i_inpaint_2.1.py.
The table below shows the generation results under different combinations of Diffusion steps and Control Scale strength:
Parameter Description:
Diffusion Steps: Number of iteration steps for the diffusion model (9, 10, 20, 30, 40) Control Scale: Control strength coefficient (0.65 - 1.0)