L&V Models
updated
Sora: A Review on Background, Technology, Limitations, and Opportunities
of Large Vision Models
Paper
• 2402.17177
• Published • 87
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Paper
• 2403.13248
• Published • 78
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper
• 2311.05437
• Published • 51
UniAff: A Unified Representation of Affordances for Tool Usage and
Articulation with Vision-Language Models
Paper
• 2409.20551
• Published • 14
Visual Question Decomposition on Multimodal Large Language Models
Paper
• 2409.19339
• Published • 8
Image Copy Detection for Diffusion Models
Paper
• 2409.19952
• Published • 13
FreeInit: Bridging Initialization Gap in Video Diffusion Models
Paper
• 2312.07537
• Published • 27