Add library_name and pipeline_tag metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +125 -110
README.md CHANGED
@@ -1,110 +1,125 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
-
5
- # πŸš€ SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
6
-
7
- [![arXiv](https://img.shields.io/badge/Arxiv-2506.00523-b31b1b)](https://arxiv.org/abs/2506.00523)
8
- [![GitHub Repo stars](https://img.shields.io/github/stars/XingtongGe/SenseFlow.svg?style=social&label=Star&maxAge=60)](https://github.com/XingtongGe/SenseFlow)
9
-
10
- [![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-md-dark.svg)](https://huggingface.co/domiso/SenseFlow)
11
-
12
- <!-- [πŸ€— HuggingFace Model](https://huggingface.co/domiso/SenseFlow) -->
13
-
14
- [Xingtong Ge](https://xingtongge.github.io/)<sup>1,2</sup>, Xin Zhang<sup>2</sup>, [Tongda Xu](https://tongdaxu.github.io/)<sup>3</sup>, [Yi Zhang](https://zhangyi-3.github.io/)<sup>4</sup>, [Xinjie Zhang](https://xinjie-q.github.io/)<sup>1</sup>, [Yan Wang](https://yanwang202199.github.io/)<sup>3</sup>, [Jun Zhang](https://eejzhang.people.ust.hk/)<sup>1</sup>
15
-
16
- <sup>1</sup>HKUST, <sup>2</sup>SenseTime Research, <sup>3</sup>Tsinghua University, <sup>4</sup>CUHK MMLab
17
-
18
- ## Abstract
19
-
20
- The Distribution Matching Distillation (DMD) has been successfully applied to text-to-image diffusion models such as Stable Diffusion (SD) 1.5. However, vanilla DMD suffers from convergence difficulties on large-scale flow-based text-to-image models, such as SD 3.5 and FLUX. In this paper, we first analyze the issues when applying vanilla DMD on large-scale models. Then, to overcome the scalability challenge, we propose implicit distribution alignment (IDA) to constrain the divergence between the generator and the fake distribution. Furthermore, we propose intra-segment guidance (ISG) to relocate the timestep denoising importance from the teacher model. With IDA alone, DMD converges for SD 3.5; employing both IDA and ISG, DMD converges for SD 3.5 and FLUX.1 dev. Together with a scaled VFM-based discriminator, our final model, dubbed **SenseFlow**, achieves superior performance in distillation for both diffusion based text-to-image models such as SDXL, and flow-matching models such as SD 3.5 Large and FLUX.1 dev.
21
-
22
- ## SenseFlow-FLUX.1 dev (supports 4–8-step generation)
23
- * `SenseFlow-FLUX/diffusion_pytorch_model.safetensors`: the DiT checkpoint.
24
- * `SenseFlow-FLUX/config.json`: the config of DiT using in our model.
25
-
26
-
27
- ### Usage
28
-
29
- 1. prepare the base checkpoint of FLUX.1 dev to `Path/to/FLUX`
30
- 2. Use `SenseFlow-FLUX` to replace the transformer folder `Path/to/FLUX/transformer`, obtaining the `Path/to/SenseFlow-FLUX`.
31
-
32
- #### Using the Euler sampler
33
- ```python
34
- import torch
35
- from diffusers import FluxPipeline
36
- from diffusers import FlowMatchEulerDiscreteScheduler
37
-
38
- pipe = FluxPipeline.from_pretrained("Path/to/SenseFlow-FLUX", torch_dtype=torch.bfloat16).to("cuda")
39
-
40
- prompt="A cat sleeping on a windowsill with white curtains fluttering in the breeze"
41
-
42
- images = pipe(
43
- prompt,
44
- height=1024,
45
- width=1024,
46
- num_inference_steps=4,
47
- max_sequence_length=512,
48
- ).images[0]
49
-
50
- images.save("output.png")
51
- ```
52
- #### Using the x0 sampler (similar to the LCMScheduler in diffusers)
53
- ```python
54
- import torch
55
- from diffusers import FluxPipeline
56
- from diffusers import FlowMatchEulerDiscreteScheduler
57
- from typing import Union, Tuple, Optional
58
-
59
- class FlowMatchEulerX0Scheduler(FlowMatchEulerDiscreteScheduler):
60
- def step(
61
- self,
62
- model_output: torch.FloatTensor,
63
- timestep: Union[float, torch.FloatTensor],
64
- sample: torch.FloatTensor,
65
- generator: Optional[torch.Generator] = None,
66
- return_dict: bool = True,
67
- ) -> Union[FlowMatchEulerDiscreteSchedulerOutput, Tuple]:
68
-
69
- if self.step_index is None:
70
- self._init_step_index(timestep)
71
-
72
- sample = sample.to(torch.float32) # Ensure precision
73
-
74
- sigma = self.sigmas[self.step_index]
75
- sigma_next = self.sigmas[self.step_index + 1]
76
-
77
- # 1. Compute x0 from model output (assuming model predicts noise)
78
- x0 = sample - sigma * model_output
79
-
80
- # 2. Add noise to x0 to get the sample for the next step
81
- noise = torch.randn_like(sample)
82
- prev_sample = (1 - sigma_next) * x0 + sigma_next * noise
83
-
84
- prev_sample = prev_sample.to(model_output.dtype) # Convert back to original dtype
85
- self._step_index += 1 # Move to next step
86
-
87
- if not return_dict:
88
- return (prev_sample,)
89
-
90
- return FlowMatchEulerDiscreteSchedulerOutput(prev_sample=prev_sample)
91
-
92
- pipe = FluxPipeline.from_pretrained("Path/to/SenseFlow-FLUX", torch_dtype=torch.bfloat16).to("cuda")
93
- pipe.scheduler = FlowMatchEulerX0Scheduler.from_config(pipe.scheduler.config)
94
-
95
- prompt="A cat sleeping on a windowsill with white curtains fluttering in the breeze"
96
-
97
- images = pipe(
98
- prompt,
99
- height=1024,
100
- width=1024,
101
- num_inference_steps=4,
102
- max_sequence_length=512,
103
- ).images[0]
104
-
105
- images.save("output.png")
106
- ```
107
-
108
- ## DanceGRPO-SenseFlow (supports 4–8-step generation)
109
-
110
- comming soon!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: diffusers
4
+ pipeline_tag: text-to-image
5
+ tags:
6
+ - flow-matching
7
+ - distillation
8
+ - flux
9
+ - stable-diffusion
10
+ ---
11
+
12
+ # πŸš€ SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
13
+
14
+ [![arXiv](https://img.shields.io/badge/Arxiv-2506.00523-b31b1b)](https://arxiv.org/abs/2506.00523)
15
+ [![GitHub Repo stars](https://img.shields.io/github/stars/XingtongGe/SenseFlow.svg?style=social&label=Star&maxAge=60)](https://github.com/XingtongGe/SenseFlow)
16
+ [![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-md-dark.svg)](https://huggingface.co/domiso/SenseFlow)
17
+
18
+ [Xingtong Ge](https://xingtongge.github.io/)<sup>1,2</sup>, Xin Zhang<sup>2</sup>, [Tongda Xu](https://tongdaxu.github.io/)<sup>3</sup>, [Yi Zhang](https://zhangyi-3.github.io/)<sup>4</sup>, [Xinjie Zhang](https://xinjie-q.github.io/)<sup>1</sup>, [Yan Wang](https://yanwang202199.github.io/)<sup>3</sup>, [Jun Zhang](https://eejzhang.people.ust.hk/)<sup>1</sup>
19
+
20
+ <sup>1</sup>HKUST, <sup>2</sup>SenseTime Research, <sup>3</sup>Tsinghua University, <sup>4</sup>CUHK MMLab
21
+
22
+ ## Abstract
23
+
24
+ The Distribution Matching Distillation (DMD) has been successfully applied to text-to-image diffusion models such as Stable Diffusion (SD) 1.5. However, vanilla DMD suffers from convergence difficulties on large-scale flow-based text-to-image models, such as SD 3.5 and FLUX. In this paper, we first analyze the issues when applying vanilla DMD on large-scale models. Then, to overcome the scalability challenge, we propose implicit distribution alignment (IDA) to constrain the divergence between the generator and the fake distribution. Furthermore, we propose intra-segment guidance (ISG) to relocate the timestep denoising importance from the teacher model. With IDA alone, DMD converges for SD 3.5; employing both IDA and ISG, DMD converges for SD 3.5 and FLUX.1 dev. Together with a scaled VFM-based discriminator, our final model, dubbed **SenseFlow**, achieves superior performance in distillation for both diffusion based text-to-image models such as SDXL, and flow-matching models such as SD 3.5 Large and FLUX.1 dev.
25
+
26
+ ## SenseFlow-FLUX.1 dev (supports 4–8-step generation)
27
+ * `SenseFlow-FLUX/diffusion_pytorch_model.safetensors`: the DiT checkpoint.
28
+ * `SenseFlow-FLUX/config.json`: the config of DiT using in our model.
29
+
30
+
31
+ ### Usage
32
+
33
+ 1. prepare the base checkpoint of FLUX.1 dev to `Path/to/FLUX`
34
+ 2. Use `SenseFlow-FLUX` to replace the transformer folder `Path/to/FLUX/transformer`, obtaining the `Path/to/SenseFlow-FLUX`.
35
+
36
+ #### Using the Euler sampler
37
+ ```python
38
+ import torch
39
+ from diffusers import FluxPipeline
40
+ from diffusers import FlowMatchEulerDiscreteScheduler
41
+
42
+ pipe = FluxPipeline.from_pretrained("Path/to/SenseFlow-FLUX", torch_dtype=torch.bfloat16).to("cuda")
43
+
44
+ prompt="A cat sleeping on a windowsill with white curtains fluttering in the breeze"
45
+
46
+ images = pipe(
47
+ prompt,
48
+ height=1024,
49
+ width=1024,
50
+ num_inference_steps=4,
51
+ max_sequence_length=512,
52
+ ).images[0]
53
+
54
+ images.save("output.png")
55
+ ```
56
+ #### Using the x0 sampler (similar to the LCMScheduler in diffusers)
57
+ ```python
58
+ import torch
59
+ from diffusers import FluxPipeline
60
+ from diffusers import FlowMatchEulerDiscreteScheduler
61
+ from typing import Union, Tuple, Optional
62
+
63
+ class FlowMatchEulerX0Scheduler(FlowMatchEulerDiscreteScheduler):
64
+ def step(
65
+ self,
66
+ model_output: torch.FloatTensor,
67
+ timestep: Union[float, torch.FloatTensor],
68
+ sample: torch.FloatTensor,
69
+ generator: Optional[torch.Generator] = None,
70
+ return_dict: bool = True,
71
+ ) -> Union[FlowMatchEulerDiscreteSchedulerOutput, Tuple]:
72
+
73
+ if self.step_index is None:
74
+ self._init_step_index(timestep)
75
+
76
+ sample = sample.to(torch.float32) # Ensure precision
77
+
78
+ sigma = self.sigmas[self.step_index]
79
+ sigma_next = self.sigmas[self.step_index + 1]
80
+
81
+ # 1. Compute x0 from model output (assuming model predicts noise)
82
+ x0 = sample - sigma * model_output
83
+
84
+ # 2. Add noise to x0 to get the sample for the next step
85
+ noise = torch.randn_like(sample)
86
+ prev_sample = (1 - sigma_next) * x0 + sigma_next * noise
87
+
88
+ prev_sample = prev_sample.to(model_output.dtype) # Convert back to original dtype
89
+ self._step_index += 1 # Move to next step
90
+
91
+ if not return_dict:
92
+ return (prev_sample,)
93
+
94
+ return FlowMatchEulerDiscreteSchedulerOutput(prev_sample=prev_sample)
95
+
96
+ pipe = FluxPipeline.from_pretrained("Path/to/SenseFlow-FLUX", torch_dtype=torch.bfloat16).to("cuda")
97
+ pipe.scheduler = FlowMatchEulerX0Scheduler.from_config(pipe.scheduler.config)
98
+
99
+ prompt="A cat sleeping on a windowsill with white curtains fluttering in the breeze"
100
+
101
+ images = pipe(
102
+ prompt,
103
+ height=1024,
104
+ width=1024,
105
+ num_inference_steps=4,
106
+ max_sequence_length=512,
107
+ ).images[0]
108
+
109
+ images.save("output.png")
110
+ ```
111
+
112
+ ## DanceGRPO-SenseFlow (supports 4–8-step generation)
113
+
114
+ comming soon!
115
+
116
+ ## Citation
117
+
118
+ ```bibtex
119
+ @article{ge2025senseflow,
120
+ title={SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation},
121
+ author={Ge, Xingtong and Zhang, Xin and Xu, Tongda and Zhang, Yi and Xinjie, Zhang and Yan, Wang and Jun, Zhang},
122
+ journal={arXiv preprint arXiv:2506.00523},
123
+ year={2025}
124
+ }
125
+ ```