daily papers
updated
GenTron: Delving Deep into Diffusion Transformers for Image and Video
Generation
Paper
• 2312.04557
• Published
• 13
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper
• 2312.04410
• Published
• 15
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper
• 2312.04461
• Published
• 62
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes
Interactively
Paper
• 2401.02955
• Published
• 23
Denoising Vision Transformers
Paper
• 2401.02957
• Published
• 31
SSR-Encoder: Encoding Selective Subject Representation for
Subject-Driven Generation
Paper
• 2312.16272
• Published
• 7
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with
Time-Decoupled Training and Reusable Coop-Diffusion
Paper
• 2312.16486
• Published
• 7
Edify Image: High-Quality Image Generation with Pixel Space Laplacian
Diffusion Models
Paper
• 2411.07126
• Published
• 30
Motion Control for Enhanced Complex Action Video Generation
Paper
• 2411.08328
• Published
• 5
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified
Multimodal Understanding and Generation
Paper
• 2411.07975
• Published
• 31
Pyramidal Flow Matching for Efficient Video Generative Modeling
Paper
• 2410.05954
• Published
• 40
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Paper
• 2412.04432
• Published
• 16
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper
• 2412.04814
• Published
• 46
Mind the Time: Temporally-Controlled Multi-Event Video Generation
Paper
• 2412.05263
• Published
• 10
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Paper
• 2412.01169
• Published
• 13
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Paper
• 2410.13861
• Published
• 56
UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit
Consistency
Paper
• 2412.15216
• Published
• 5
MotiF: Making Text Count in Image Animation with Motion Focal Loss
Paper
• 2412.16153
• Published
• 6
Large Motion Video Autoencoding with Cross-modal Video VAE
Paper
• 2412.17805
• Published
• 24
AnyStory: Towards Unified Single and Multiple Subject Personalization in
Text-to-Image Generation
Paper
• 2501.09503
• Published
• 14
Do generative video models learn physical principles from watching
videos?
Paper
• 2501.09038
• Published
• 34
Small Models Struggle to Learn from Strong Reasoners
Paper
• 2502.12143
• Published
• 39
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic
Faithfulness
Paper
• 2503.21755
• Published
• 33
Efficient Generative Model Training via Embedded Representation Warmup
Paper
• 2504.10188
• Published
• 12
Video-As-Prompt: Unified Semantic Control for Video Generation
Paper
• 2510.20888
• Published
• 50
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
Paper
• 2511.09611
• Published
• 70
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models
Paper
• 2511.13704
• Published
• 43
Back to Basics: Let Denoising Generative Models Denoise
Paper
• 2511.13720
• Published
• 69
DiP: Taming Diffusion Models in Pixel Space
Paper
• 2511.18822
• Published
• 29
Diffusion Transformers with Representation Autoencoders
Paper
• 2510.11690
• Published
• 166
PixelDiT: Pixel Diffusion Transformers for Image Generation
Paper
• 2511.20645
• Published
• 35