-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper β’ 2408.11796 β’ Published β’ 59 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper β’ 2408.09174 β’ Published β’ 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper β’ 2408.10914 β’ Published β’ 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper β’ 2408.11878 β’ Published β’ 64
Collections
Discover the best community collections!
Collections including paper arxiv:2409.12917
-
Can Large Language Models Understand Context?
Paper β’ 2402.00858 β’ Published β’ 24 -
OLMo: Accelerating the Science of Language Models
Paper β’ 2402.00838 β’ Published β’ 85 -
Self-Rewarding Language Models
Paper β’ 2401.10020 β’ Published β’ 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper β’ 2401.17072 β’ Published β’ 25
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper β’ 2501.12948 β’ Published β’ 444 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper β’ 2409.12917 β’ Published β’ 140 -
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation
Paper β’ 2409.12576 β’ Published β’ 16 -
Transformer Explainer: Interactive Learning of Text-Generative Models
Paper β’ 2408.04619 β’ Published β’ 175
-
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper β’ 2503.01785 β’ Published β’ 86 -
When an LLM is apprehensive about its answers -- and when its uncertainty is justified
Paper β’ 2503.01688 β’ Published β’ 21 -
Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Paper β’ 2503.00808 β’ Published β’ 57 -
Chain of Draft: Thinking Faster by Writing Less
Paper β’ 2502.18600 β’ Published β’ 50
-
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper β’ 2408.14906 β’ Published β’ 144 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper β’ 2409.12917 β’ Published β’ 140 -
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Paper β’ 2409.02795 β’ Published β’ 72 -
Attention Heads of Large Language Models: A Survey
Paper β’ 2409.03752 β’ Published β’ 92
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper β’ 2402.12354 β’ Published β’ 7 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper β’ 2402.12659 β’ Published β’ 23 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper β’ 2402.13249 β’ Published β’ 15 -
TrustLLM: Trustworthiness in Large Language Models
Paper β’ 2401.05561 β’ Published β’ 69
-
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper β’ 2512.24138 β’ Published β’ 30 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper β’ 2512.24165 β’ Published β’ 52 -
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper β’ 2512.24617 β’ Published β’ 65 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper β’ 2512.23447 β’ Published β’ 99
-
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
Paper β’ 2405.06682 β’ Published β’ 3 -
Self-Refine: Iterative Refinement with Self-Feedback
Paper β’ 2303.17651 β’ Published β’ 2 -
Rethinking Chain-of-Thought from the Perspective of Self-Training
Paper β’ 2412.10827 β’ Published -
Reflexion: Language Agents with Verbal Reinforcement Learning
Paper β’ 2303.11366 β’ Published β’ 5
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper β’ 2408.11796 β’ Published β’ 59 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper β’ 2408.09174 β’ Published β’ 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper β’ 2408.10914 β’ Published β’ 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper β’ 2408.11878 β’ Published β’ 64
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper β’ 2402.12354 β’ Published β’ 7 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper β’ 2402.12659 β’ Published β’ 23 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper β’ 2402.13249 β’ Published β’ 15 -
TrustLLM: Trustworthiness in Large Language Models
Paper β’ 2401.05561 β’ Published β’ 69
-
Can Large Language Models Understand Context?
Paper β’ 2402.00858 β’ Published β’ 24 -
OLMo: Accelerating the Science of Language Models
Paper β’ 2402.00838 β’ Published β’ 85 -
Self-Rewarding Language Models
Paper β’ 2401.10020 β’ Published β’ 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper β’ 2401.17072 β’ Published β’ 25
-
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper β’ 2512.24138 β’ Published β’ 30 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper β’ 2512.24165 β’ Published β’ 52 -
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper β’ 2512.24617 β’ Published β’ 65 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper β’ 2512.23447 β’ Published β’ 99
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper β’ 2501.12948 β’ Published β’ 444 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper β’ 2409.12917 β’ Published β’ 140 -
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation
Paper β’ 2409.12576 β’ Published β’ 16 -
Transformer Explainer: Interactive Learning of Text-Generative Models
Paper β’ 2408.04619 β’ Published β’ 175
-
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
Paper β’ 2405.06682 β’ Published β’ 3 -
Self-Refine: Iterative Refinement with Self-Feedback
Paper β’ 2303.17651 β’ Published β’ 2 -
Rethinking Chain-of-Thought from the Perspective of Self-Training
Paper β’ 2412.10827 β’ Published -
Reflexion: Language Agents with Verbal Reinforcement Learning
Paper β’ 2303.11366 β’ Published β’ 5
-
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper β’ 2503.01785 β’ Published β’ 86 -
When an LLM is apprehensive about its answers -- and when its uncertainty is justified
Paper β’ 2503.01688 β’ Published β’ 21 -
Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Paper β’ 2503.00808 β’ Published β’ 57 -
Chain of Draft: Thinking Faster by Writing Less
Paper β’ 2502.18600 β’ Published β’ 50
-
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper β’ 2408.14906 β’ Published β’ 144 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper β’ 2409.12917 β’ Published β’ 140 -
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Paper β’ 2409.02795 β’ Published β’ 72 -
Attention Heads of Large Language Models: A Survey
Paper β’ 2409.03752 β’ Published β’ 92