domiso
/

SenseFlow

@@ -1,110 +1,125 @@
----
-license: apache-2.0
----
-# 🚀 SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
-[![arXiv](https://img.shields.io/badge/Arxiv-2506.00523-b31b1b)](https://arxiv.org/abs/2506.00523)
-[![GitHub Repo stars](https://img.shields.io/github/stars/XingtongGe/SenseFlow.svg?style=social&label=Star&maxAge=60)](https://github.com/XingtongGe/SenseFlow)
-[![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-md-dark.svg)](https://huggingface.co/domiso/SenseFlow)
-<!-- [🤗 HuggingFace Model](https://huggingface.co/domiso/SenseFlow)  -->
-[Xingtong Ge](https://xingtongge.github.io/)<sup>1,2</sup>, Xin Zhang<sup>2</sup>, [Tongda Xu](https://tongdaxu.github.io/)<sup>3</sup>, [Yi Zhang](https://zhangyi-3.github.io/)<sup>4</sup>, [Xinjie Zhang](https://xinjie-q.github.io/)<sup>1</sup>, [Yan Wang](https://yanwang202199.github.io/)<sup>3</sup>, [Jun Zhang](https://eejzhang.people.ust.hk/)<sup>1</sup>
-<sup>1</sup>HKUST, <sup>2</sup>SenseTime Research, <sup>3</sup>Tsinghua University, <sup>4</sup>CUHK MMLab
-## Abstract
-The Distribution Matching Distillation (DMD) has been successfully applied to text-to-image diffusion models such as Stable Diffusion (SD) 1.5. However, vanilla DMD suffers from convergence difficulties on large-scale flow-based text-to-image models, such as SD 3.5 and FLUX. In this paper, we first analyze the issues when applying vanilla DMD on large-scale models. Then, to overcome the scalability challenge, we propose implicit distribution alignment (IDA) to constrain the divergence between the generator and the fake distribution. Furthermore, we propose intra-segment guidance (ISG) to relocate the timestep denoising importance from the teacher model. With IDA alone, DMD converges for SD 3.5; employing both IDA and ISG, DMD converges for SD 3.5 and FLUX.1 dev. Together with a scaled VFM-based discriminator, our final model, dubbed **SenseFlow**, achieves superior performance in distillation for both diffusion based text-to-image models such as SDXL, and flow-matching models such as SD 3.5 Large and FLUX.1 dev.
-## SenseFlow-FLUX.1 dev (supports 4–8-step generation)
-* `SenseFlow-FLUX/diffusion_pytorch_model.safetensors`: the DiT checkpoint.
-* `SenseFlow-FLUX/config.json`: the config of DiT using in our model.
-### Usage
-1. prepare the base checkpoint of FLUX.1 dev to `Path/to/FLUX`
-2. Use `SenseFlow-FLUX` to replace the transformer folder `Path/to/FLUX/transformer`, obtaining the `Path/to/SenseFlow-FLUX`.
-#### Using the Euler sampler
-```python
-import torch
-from diffusers import FluxPipeline
-from diffusers import FlowMatchEulerDiscreteScheduler
-pipe = FluxPipeline.from_pretrained("Path/to/SenseFlow-FLUX", torch_dtype=torch.bfloat16).to("cuda")
-prompt="A cat sleeping on a windowsill with white curtains fluttering in the breeze"
-images = pipe(
-    prompt,
-    height=1024,
-    width=1024,
-    num_inference_steps=4,
-    max_sequence_length=512,
-).images[0]
-images.save("output.png")
-```
-#### Using the x0 sampler (similar to the LCMScheduler in diffusers)
-```python
-import torch
-from diffusers import FluxPipeline
-from diffusers import FlowMatchEulerDiscreteScheduler
-from typing import Union, Tuple, Optional
-class FlowMatchEulerX0Scheduler(FlowMatchEulerDiscreteScheduler):
-    def step(
-        self,
-        model_output: torch.FloatTensor,
-        timestep: Union[float, torch.FloatTensor],
-        sample: torch.FloatTensor,
-        generator: Optional[torch.Generator] = None,
-        return_dict: bool = True,
-    ) -> Union[FlowMatchEulerDiscreteSchedulerOutput, Tuple]:
-        if self.step_index is None:
-            self._init_step_index(timestep)
-        sample = sample.to(torch.float32)  # Ensure precision
-        sigma = self.sigmas[self.step_index]
-        sigma_next = self.sigmas[self.step_index + 1]
-        # 1. Compute x0 from model output (assuming model predicts noise)
-        x0 = sample - sigma * model_output
-        # 2. Add noise to x0 to get the sample for the next step
-        noise = torch.randn_like(sample)
-        prev_sample = (1 - sigma_next) * x0 + sigma_next * noise
-        prev_sample = prev_sample.to(model_output.dtype)  # Convert back to original dtype
-        self._step_index += 1  # Move to next step
-        if not return_dict:
-            return (prev_sample,)
-        return FlowMatchEulerDiscreteSchedulerOutput(prev_sample=prev_sample)
-pipe = FluxPipeline.from_pretrained("Path/to/SenseFlow-FLUX", torch_dtype=torch.bfloat16).to("cuda")
-pipe.scheduler = FlowMatchEulerX0Scheduler.from_config(pipe.scheduler.config)
-prompt="A cat sleeping on a windowsill with white curtains fluttering in the breeze"
-images = pipe(
-    prompt,
-    height=1024,
-    width=1024,
-    num_inference_steps=4,
-    max_sequence_length=512,
-).images[0]
-images.save("output.png")
-```
-## DanceGRPO-SenseFlow (supports 4–8-step generation)
-comming soon!

+---
+license: apache-2.0
+library_name: diffusers
+pipeline_tag: text-to-image
+tags:
+- flow-matching
+- distillation
+- flux
+- stable-diffusion
+---
+# 🚀 SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
+[![arXiv](https://img.shields.io/badge/Arxiv-2506.00523-b31b1b)](https://arxiv.org/abs/2506.00523)
+[![GitHub Repo stars](https://img.shields.io/github/stars/XingtongGe/SenseFlow.svg?style=social&label=Star&maxAge=60)](https://github.com/XingtongGe/SenseFlow)
+[![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-md-dark.svg)](https://huggingface.co/domiso/SenseFlow)
+[Xingtong Ge](https://xingtongge.github.io/)<sup>1,2</sup>, Xin Zhang<sup>2</sup>, [Tongda Xu](https://tongdaxu.github.io/)<sup>3</sup>, [Yi Zhang](https://zhangyi-3.github.io/)<sup>4</sup>, [Xinjie Zhang](https://xinjie-q.github.io/)<sup>1</sup>, [Yan Wang](https://yanwang202199.github.io/)<sup>3</sup>, [Jun Zhang](https://eejzhang.people.ust.hk/)<sup>1</sup>
+<sup>1</sup>HKUST, <sup>2</sup>SenseTime Research, <sup>3</sup>Tsinghua University, <sup>4</sup>CUHK MMLab
+## Abstract
+The Distribution Matching Distillation (DMD) has been successfully applied to text-to-image diffusion models such as Stable Diffusion (SD) 1.5. However, vanilla DMD suffers from convergence difficulties on large-scale flow-based text-to-image models, such as SD 3.5 and FLUX. In this paper, we first analyze the issues when applying vanilla DMD on large-scale models. Then, to overcome the scalability challenge, we propose implicit distribution alignment (IDA) to constrain the divergence between the generator and the fake distribution. Furthermore, we propose intra-segment guidance (ISG) to relocate the timestep denoising importance from the teacher model. With IDA alone, DMD converges for SD 3.5; employing both IDA and ISG, DMD converges for SD 3.5 and FLUX.1 dev. Together with a scaled VFM-based discriminator, our final model, dubbed **SenseFlow**, achieves superior performance in distillation for both diffusion based text-to-image models such as SDXL, and flow-matching models such as SD 3.5 Large and FLUX.1 dev.
+## SenseFlow-FLUX.1 dev (supports 4–8-step generation)
+* `SenseFlow-FLUX/diffusion_pytorch_model.safetensors`: the DiT checkpoint.
+* `SenseFlow-FLUX/config.json`: the config of DiT using in our model.
+### Usage
+1. prepare the base checkpoint of FLUX.1 dev to `Path/to/FLUX`
+2. Use `SenseFlow-FLUX` to replace the transformer folder `Path/to/FLUX/transformer`, obtaining the `Path/to/SenseFlow-FLUX`.
+#### Using the Euler sampler
+```python
+import torch
+from diffusers import FluxPipeline
+from diffusers import FlowMatchEulerDiscreteScheduler
+pipe = FluxPipeline.from_pretrained("Path/to/SenseFlow-FLUX", torch_dtype=torch.bfloat16).to("cuda")
+prompt="A cat sleeping on a windowsill with white curtains fluttering in the breeze"
+images = pipe(
+    prompt,
+    height=1024,
+    width=1024,
+    num_inference_steps=4,
+    max_sequence_length=512,
+).images[0]
+images.save("output.png")
+```
+#### Using the x0 sampler (similar to the LCMScheduler in diffusers)
+```python
+import torch
+from diffusers import FluxPipeline
+from diffusers import FlowMatchEulerDiscreteScheduler
+from typing import Union, Tuple, Optional
+class FlowMatchEulerX0Scheduler(FlowMatchEulerDiscreteScheduler):
+    def step(
+        self,
+        model_output: torch.FloatTensor,
+        timestep: Union[float, torch.FloatTensor],
+        sample: torch.FloatTensor,
+        generator: Optional[torch.Generator] = None,
+        return_dict: bool = True,
+    ) -> Union[FlowMatchEulerDiscreteSchedulerOutput, Tuple]:
+        if self.step_index is None:
+            self._init_step_index(timestep)
+        sample = sample.to(torch.float32)  # Ensure precision
+        sigma = self.sigmas[self.step_index]
+        sigma_next = self.sigmas[self.step_index + 1]
+        # 1. Compute x0 from model output (assuming model predicts noise)
+        x0 = sample - sigma * model_output
+        # 2. Add noise to x0 to get the sample for the next step
+        noise = torch.randn_like(sample)
+        prev_sample = (1 - sigma_next) * x0 + sigma_next * noise
+        prev_sample = prev_sample.to(model_output.dtype)  # Convert back to original dtype
+        self._step_index += 1  # Move to next step
+        if not return_dict:
+            return (prev_sample,)
+        return FlowMatchEulerDiscreteSchedulerOutput(prev_sample=prev_sample)
+pipe = FluxPipeline.from_pretrained("Path/to/SenseFlow-FLUX", torch_dtype=torch.bfloat16).to("cuda")
+pipe.scheduler = FlowMatchEulerX0Scheduler.from_config(pipe.scheduler.config)
+prompt="A cat sleeping on a windowsill with white curtains fluttering in the breeze"
+images = pipe(
+    prompt,
+    height=1024,
+    width=1024,
+    num_inference_steps=4,
+    max_sequence_length=512,
+).images[0]
+images.save("output.png")
+```
+## DanceGRPO-SenseFlow (supports 4–8-step generation)
+comming soon!
+## Citation
+```bibtex
+@article{ge2025senseflow,
+  title={SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation},
+  author={Ge, Xingtong and Zhang, Xin and Xu, Tongda and Zhang, Yi and Xinjie, Zhang and Yan, Wang and Jun, Zhang},
+  journal={arXiv preprint arXiv:2506.00523},
+  year={2025}
+}
+```