advancing research - a rubbyninja Collection

rubbyninja 's Collections

advancing research

advancing research

updated Jul 9, 2025

STaR: Bootstrapping Reasoning With Reasoning

Paper • 2203.14465 • Published Mar 28, 2022 • 9
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 59
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 25
Prompt Cache: Modular Attention Reuse for Low-Latency Inference

Paper • 2311.04934 • Published Nov 7, 2023 • 32
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14, 2024 • 79
Let's Verify Step by Step

Paper • 2305.20050 • Published May 31, 2023 • 11
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Paper • 2407.21787 • Published Jul 31, 2024 • 13
Solving math word problems with process- and outcome-based feedback

Paper • 2211.14275 • Published Nov 25, 2022 • 10
Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 140
Aligning Machine and Human Visual Representations across Abstraction Levels

Paper • 2409.06509 • Published Sep 10, 2024 • 2
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Paper • 2410.05229 • Published Oct 7, 2024 • 22
nGPT: Normalized Transformer with Representation Learning on the Hypersphere

Paper • 2410.01131 • Published Oct 1, 2024 • 10
Consistency Models

Paper • 2303.01469 • Published Mar 2, 2023 • 8
Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

Paper • 2410.11081 • Published Oct 14, 2024 • 18
Scaling Laws for Precision

Paper • 2411.04330 • Published Nov 7, 2024 • 7
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning

Paper • 2411.07279 • Published Nov 11, 2024 • 4
Test-Time Training with Self-Supervision for Generalization under Distribution Shifts

Paper • 1909.13231 • Published Sep 29, 2019 • 1
Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30, 2024 • 81
O1 Replication Journey: A Strategic Progress Report -- Part 1

Paper • 2410.18982 • Published Oct 8, 2024 • 3
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published Nov 25, 2024 • 45
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 31
Self-Taught Evaluators

Paper • 2408.02666 • Published Aug 5, 2024 • 29
Memory Layers at Scale

Paper • 2412.09764 • Published Dec 12, 2024 • 5
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 111
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Paper • 1901.02860 • Published Jan 9, 2019 • 4
Large Concept Models: Language Modeling in a Sentence Representation Space

Paper • 2412.08821 • Published Dec 11, 2024 • 17
Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 100
Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 29
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 441
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64
s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31, 2025 • 124
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Paper • 2501.09781 • Published Jan 16, 2025 • 27
Diffusion-LM Improves Controllable Text Generation

Paper • 2205.14217 • Published May 27, 2022 • 2
A Fingerprint for Large Language Models

Paper • 2407.01235 • Published Jul 1, 2024 • 1