SkyAsl commited on
Commit
c55a836
·
verified ·
1 Parent(s): 2081dec

Upload 4 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ pixel-art-result.png filter=lfs diff=lfs merge=lfs -text
Pixel-artist-train-loss-regression.png ADDED
Pixel-artist-train-loss.png ADDED
pixel-art-result.png ADDED

Git LFS Details

  • SHA256: 94ad2674ca236841a3b821b22bb36854c0d3b0e4e7a787d77ab6598237782797
  • Pointer size: 131 Bytes
  • Size of remote file: 389 kB
pixel_art_lo_ra_training_readme.md ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ library_name: diffusers
4
+ pipeline_tag: text-to-image
5
+ license: apache-2.0
6
+ base_model: Tongyi-MAI/Z-Image-Turbo
7
+ tags:
8
+ - lora
9
+ - pixel-art
10
+ - diffusion
11
+ - text-to-image
12
+ - style-adaptation
13
+ ---
14
+
15
+ # 🎨 Pixel Art Character LoRA – Z-Image-Turbo
16
+
17
+ This repository hosts **LoRA adapter weights** fine-tuned on top of **Tongyi-MAI/Z-Image-Turbo** to improve **pixel art character generation** from text prompts.
18
+
19
+ The LoRA is optimized for prompts that start with or include:
20
+
21
+ > **"a pixel art character ..."**
22
+
23
+ ---
24
+
25
+ ## 🚀 Model Description
26
+
27
+ - **Base model**: `Tongyi-MAI/Z-Image-Turbo`
28
+ - **Fine-tuning method**: LoRA (Low-Rank Adaptation)
29
+ - **Task**: Text-to-image generation
30
+ - **Specialization**: Pixel art characters
31
+ - **Trainable parameters**: ~0.1% of base model
32
+
33
+ This model does **not** replace the base model. Instead, it injects lightweight LoRA adapters into the transformer layers.
34
+
35
+ ---
36
+
37
+ ## 🧠 Why Pixel Art?
38
+
39
+ Pixel art differs significantly from natural images:
40
+ - Sharp, discrete edges
41
+ - Limited color palettes
42
+ - Low-resolution spatial structure
43
+
44
+ Generic diffusion models often blur these characteristics. This LoRA improves:
45
+ - Structural sharpness
46
+ - Style consistency
47
+ - Prompt–image alignment for pixel art descriptions
48
+
49
+ ---
50
+
51
+ ## 🧩 How to Use
52
+
53
+ ```python
54
+ import torch
55
+ from diffusers import DiffusionPipeline
56
+ from peft import PeftModel
57
+
58
+ pipe = DiffusionPipeline.from_pretrained(
59
+ "Tongyi-MAI/Z-Image-Turbo",
60
+ torch_dtype=torch.bfloat16,
61
+ device_map="cuda"
62
+ )
63
+
64
+ pipe.transformer = PeftModel.from_pretrained(
65
+ pipe.transformer,
66
+ "<your-username>/<repo-name>"
67
+ )
68
+
69
+ prompt = "a pixel art character with square orange glasses, a chef hat-shaped head and a purple-colored body on a cool background"
70
+
71
+ image = pipe(prompt).images[0]
72
+ image.save("pixel_art.png")
73
+ ```
74
+
75
+ ---
76
+
77
+ ## 🧪 Evaluation
78
+
79
+ ### CLIPScore (Prompt–Image Alignment)
80
+
81
+ | Model | Normalized CLIPScore (mean ± std) |
82
+ |------|----------------------------------|
83
+ | Base Z-Image-Turbo | **7.834 ± 2.577** |
84
+ | + Pixel Art LoRA | **8.856 ± 2.473** |
85
+
86
+ ➡️ **+1.02 CLIPScore improvement**, indicating stronger alignment with pixel-art-specific prompts.
87
+
88
+ ---
89
+
90
+ ## 📈 Training Details
91
+
92
+ - **Dataset**: `m1guelpf/nouns`
93
+ - **Image resolution**: 512×512
94
+ - **Epochs**: 1
95
+ - **Optimizer**: AdamW (`lr=1e-4`)
96
+ - **Precision**: bfloat16
97
+ - **Noise scheduler**: DDPM (300 steps)
98
+
99
+ ### LoRA Configuration
100
+ ```text
101
+ r = 16
102
+ lora_alpha = 32
103
+ lora_dropout = 0.05
104
+ target_modules = [to_q, to_k, to_v, to_out.0]
105
+ ```
106
+
107
+ ---
108
+
109
+ ## 🖼️ Example Prompts
110
+
111
+ ```text
112
+ a pixel art character with a wizard hat and glowing blue eyes
113
+ a pixel art character holding a sword and wearing red armor
114
+ a pixel art character with a robot body and green visor
115
+ ```
116
+
117
+ ---
118
+
119
+ ## ⚠️ Limitations
120
+
121
+ - Optimized primarily for **pixel art characters**
122
+ - May not improve (or may slightly degrade) photorealistic prompts
123
+ - Trained on a relatively small dataset
124
+
125
+ ---
126
+
127
+ ## 🔮 Future Work
128
+
129
+ - Multi-epoch training with early stopping
130
+ - Broader pixel-art prompt coverage
131
+ - Palette-aware regularization
132
+
133
+ ---
134
+
135
+ ## 📜 License
136
+
137
+ This LoRA follows the license of the base model **Tongyi-MAI/Z-Image-Turbo**.
138
+ Please check the original repository for full license terms.
139
+
140
+ ---
141
+
142
+ **Status**: Research / Experimental
143
+