-
Learning A Universal Crime Predictor with Knowledge-guided Hypernetworks
Paper • 2511.02336 • Published -
PersonaLive! Expressive Portrait Image Animation for Live Streaming
Paper • 2512.11253 • Published • 34 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 36
Collections
Discover the best community collections!
Collections including paper arxiv:2409.18839
-
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 139 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 111 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 36
-
SAM 3: Segment Anything with Concepts
Paper • 2511.16719 • Published • 125 -
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Paper • 2512.08765 • Published • 128 -
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
Paper • 2512.04677 • Published • 167 -
LongCat-Image Technical Report
Paper • 2512.07584 • Published • 18
-
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper • 2412.20138 • Published • 15 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 36 -
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 139 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 122
-
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 36 -
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 29 -
Differential Transformer
Paper • 2410.05258 • Published • 180 -
UniMuMo: Unified Text, Music and Motion Generation
Paper • 2410.04534 • Published • 19
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 8 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 26 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 35 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30
-
Learning A Universal Crime Predictor with Knowledge-guided Hypernetworks
Paper • 2511.02336 • Published -
PersonaLive! Expressive Portrait Image Animation for Live Streaming
Paper • 2512.11253 • Published • 34 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 36
-
SAM 3: Segment Anything with Concepts
Paper • 2511.16719 • Published • 125 -
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Paper • 2512.08765 • Published • 128 -
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
Paper • 2512.04677 • Published • 167 -
LongCat-Image Technical Report
Paper • 2512.07584 • Published • 18
-
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 139 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 111 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 36
-
TradingAgents: Multi-Agents LLM Financial Trading Framework
Paper • 2412.20138 • Published • 15 -
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 36 -
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Paper • 2509.22186 • Published • 139 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 122
-
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 36 -
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 29 -
Differential Transformer
Paper • 2410.05258 • Published • 180 -
UniMuMo: Unified Text, Music and Motion Generation
Paper • 2410.04534 • Published • 19
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 8 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 26 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 35 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30