rubbyninja 's Collections advancing research
updated
STaR: Bootstrapping Reasoning With Reasoning
Paper
• 2203.14465
• Published
• 9
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
• 2401.06066
• Published
• 59
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
• 2405.04434
• Published
• 25
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper
• 2311.04934
• Published
• 32
Quiet-STaR: Language Models Can Teach Themselves to Think Before
Speaking
Paper
• 2403.09629
• Published
• 79
Let's Verify Step by Step
Paper
• 2305.20050
• Published
• 11
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Paper
• 2407.21787
• Published
• 13
Solving math word problems with process- and outcome-based feedback
Paper
• 2211.14275
• Published
• 10
Training Language Models to Self-Correct via Reinforcement Learning
Paper
• 2409.12917
• Published
• 140
Aligning Machine and Human Visual Representations across Abstraction
Levels
Paper
• 2409.06509
• Published
• 2
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in
Large Language Models
Paper
• 2410.05229
• Published
• 22
nGPT: Normalized Transformer with Representation Learning on the
Hypersphere
Paper
• 2410.01131
• Published
• 10
Paper
• 2303.01469
• Published
• 8
Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models
Paper
• 2410.11081
• Published
• 18
Scaling Laws for Precision
Paper
• 2411.04330
• Published
• 7
The Surprising Effectiveness of Test-Time Training for Abstract
Reasoning
Paper
• 2411.07279
• Published
• 4
Test-Time Training with Self-Supervision for Generalization under
Distribution Shifts
Paper
• 1909.13231
• Published
• 1
Better & Faster Large Language Models via Multi-token Prediction
Paper
• 2404.19737
• Published
• 81
O1 Replication Journey: A Strategic Progress Report -- Part 1
Paper
• 2410.18982
• Published
• 3
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple
Distillation, Big Progress or Bitter Lesson?
Paper
• 2411.16489
• Published
• 45
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
• 2401.08967
• Published
• 31
Paper
• 2408.02666
• Published
• 29
Paper
• 2412.09764
• Published
• 5
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
• 2404.07143
• Published
• 111
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Paper
• 1901.02860
• Published
• 4
Large Concept Models: Language Modeling in a Sentence Representation
Space
Paper
• 2412.08821
• Published
• 17
Movie Gen: A Cast of Media Foundation Models
Paper
• 2410.13720
• Published
• 100
Titans: Learning to Memorize at Test Time
Paper
• 2501.00663
• Published
• 29
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published
• 441
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
• 2305.18290
• Published
• 64
s1: Simple test-time scaling
Paper
• 2501.19393
• Published
• 124
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Paper
• 2501.09781
• Published
• 27
Diffusion-LM Improves Controllable Text Generation
Paper
• 2205.14217
• Published
• 2
A Fingerprint for Large Language Models
Paper
• 2407.01235
• Published
• 1