-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 22 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39
Collections
Discover the best community collections!
Collections including paper arxiv:2503.15558
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 29 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
Cosmos World Foundation Model Platform for Physical AI
Paper • 2501.03575 • Published • 81 -
Intuitive physics understanding emerges from self-supervised pretraining on natural videos
Paper • 2502.11831 • Published • 20 -
PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations?
Paper • 2503.05333 • Published • 8 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
Paper • 2509.05739 • Published • 2 -
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper • 2509.03059 • Published • 24 -
Universal Deep Research: Bring Your Own Model and Strategy
Paper • 2509.00244 • Published • 13 -
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs
Paper • 2509.08358 • Published • 13
-
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50 -
Humanoid Policy ~ Human Policy
Paper • 2503.13441 • Published -
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
Paper • 2503.16408 • Published • 42 -
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
Paper • 2503.19757 • Published • 51
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper • 2503.14456 • Published • 153 -
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper • 2503.15265 • Published • 46 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 36 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
GRUtopia: Dream General Robots in a City at Scale
Paper • 2407.10943 • Published • 25 -
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion
Paper • 2407.10973 • Published • 11 -
Cross Anything: General Quadruped Robot Navigation through Complex Terrains
Paper • 2407.16412 • Published • 6 -
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands
Paper • 2408.11048 • Published • 4
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 9 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 22 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39
-
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
Paper • 2509.05739 • Published • 2 -
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper • 2509.03059 • Published • 24 -
Universal Deep Research: Bring Your Own Model and Strategy
Paper • 2509.00244 • Published • 13 -
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs
Paper • 2509.08358 • Published • 13
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 29 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50 -
Humanoid Policy ~ Human Policy
Paper • 2503.13441 • Published -
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
Paper • 2503.16408 • Published • 42 -
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
Paper • 2503.19757 • Published • 51
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper • 2503.14456 • Published • 153 -
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper • 2503.15265 • Published • 46 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50
-
Cosmos World Foundation Model Platform for Physical AI
Paper • 2501.03575 • Published • 81 -
Intuitive physics understanding emerges from self-supervised pretraining on natural videos
Paper • 2502.11831 • Published • 20 -
PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations?
Paper • 2503.05333 • Published • 8 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 36 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
GRUtopia: Dream General Robots in a City at Scale
Paper • 2407.10943 • Published • 25 -
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion
Paper • 2407.10973 • Published • 11 -
Cross Anything: General Quadruped Robot Navigation through Complex Terrains
Paper • 2407.16412 • Published • 6 -
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands
Paper • 2408.11048 • Published • 4