Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2602.04804

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Paper • 2602.04804 • Published Feb 4 • 48

Linear representations in language models can change dramatically over a conversation

Paper • 2601.20834 • Published Jan 28 • 21
daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently

Paper • 2602.02619 • Published Feb 2 • 52
OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Paper • 2602.04804 • Published Feb 4 • 48

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 189
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 42

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Paper • 2602.01058 • Published Feb 1 • 42
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Paper • 2602.02493 • Published Feb 2 • 46
Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published Jan 29 • 102
OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Paper • 2602.04804 • Published Feb 4 • 48

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29, 2025 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23, 2025 • 13

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Paper • 2602.04804 • Published Feb 4 • 48

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Paper • 2602.01058 • Published Feb 1 • 42
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Paper • 2602.02493 • Published Feb 2 • 46
Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published Jan 29 • 102
OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Paper • 2602.04804 • Published Feb 4 • 48

Linear representations in language models can change dramatically over a conversation

Paper • 2601.20834 • Published Jan 28 • 21
daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently

Paper • 2602.02619 • Published Feb 2 • 52
OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Paper • 2602.04804 • Published Feb 4 • 48

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29, 2025 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23, 2025 • 13

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 189
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 42

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs