Meta-Reinforcement Learning with Self-Reflection for Agentic Search Paper • 2603.11327 • Published 27 days ago • 9
ActionParty: Multi-Subject Action Binding in Generative Video Games Paper • 2604.02330 • Published 5 days ago • 5
Meta-Harness: End-to-End Optimization of Model Harnesses Paper • 2603.28052 • Published 8 days ago • 14
FlashSampling: Fast and Memory-Efficient Exact Sampling Paper • 2603.15854 • Published 22 days ago • 9
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation Paper • 2603.10899 • Published 27 days ago • 7
EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models Paper • 2603.12252 • Published 26 days ago • 12
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion Paper • 2603.06577 • Published Mar 6 • 48
Lost in Backpropagation: The LM Head is a Gradient Bottleneck Paper • 2603.10145 • Published 28 days ago • 12
Adaptive Loops and Memory in Transformers: Think Harder or Know More? Paper • 2603.08391 • Published 27 days ago • 1
FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling Paper • 2603.05451 • Published Mar 5 • 1
Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned Paper • 2603.05344 • Published Mar 5 • 7
Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol Paper • 2603.11382 • Published 26 days ago • 1
AgentRx: Diagnosing AI Agent Failures from Execution Trajectories Paper • 2602.02475 • Published Feb 2 • 1
Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models Paper • 2603.10098 • Published 28 days ago • 3
RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation Paper • 2603.09723 • Published 28 days ago • 7