Andreas Stöffelbauer

andreasskyscanner

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

upvoted a paper 3 days ago

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

upvoted a paper 10 days ago

Trajectory-Refined Distillation

View all activity

Organizations

None yet

upvoted a paper 1 day ago

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

Paper • 2606.18216 • Published 4 days ago • 50

upvoted a paper 3 days ago

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Paper • 2602.21103 • Published 18 days ago • 4

upvoted a paper 10 days ago

Trajectory-Refined Distillation

Paper • 2606.08432 • Published 13 days ago • 7

upvoted 2 papers 11 days ago

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Paper • 2605.28742 • Published 24 days ago • 4

Reinforcement Learning from Rich Feedback with Distributional DAgger

Paper • 2606.05152 • Published 17 days ago • 3

upvoted a paper 17 days ago

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Paper • 2606.02437 • Published 19 days ago • 231

upvoted a paper 18 days ago

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

Paper • 2605.29548 • Published 23 days ago • 11

upvoted a paper 25 days ago

Your Embedding Model is SMARTer Than You Think

Paper • 2605.24938 • Published 27 days ago • 25

upvoted a paper 26 days ago

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Paper • 2605.23904 • Published 29 days ago • 240

upvoted 2 papers about 1 month ago

Context Training with Active Information Seeking

Paper • 2605.13050 • Published May 13 • 7

Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

Paper • 2605.13511 • Published May 13 • 33

upvoted a paper about 2 months ago

Predicting integers from continuous parameters

Paper • 2602.10751 • Published Apr 13 • 3

upvoted 6 papers 2 months ago

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

Paper • 2604.11626 • Published Apr 13 • 102

You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass

Paper • 2604.10966 • Published Apr 13 • 12

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Paper • 2604.13010 • Published Apr 14 • 18

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Paper • 2604.13016 • Published Apr 14 • 111

p1: Better Prompt Optimization with Fewer Prompts

Paper • 2604.08801 • Published Apr 9 • 9

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

Paper • 2604.01591 • Published Apr 2 • 42

upvoted 2 papers 3 months ago

Embarrassingly Simple Self-Distillation Improves Code Generation

Paper • 2604.01193 • Published Apr 1 • 56

Terminal Agents Suffice for Enterprise Automation

Paper • 2604.00073 • Published Mar 31 • 97

Andreas Stöffelbauer

AI & ML interests

Recent Activity

Organizations

andreasskyscanner's activity