zhangBeiQing (beiqing) – Community Activity

commented 3 papers 3 months ago

StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding

Paper • 2508.15717 • Published Aug 21, 2025 • 1 •

1

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Paper • 2404.05726 • Published Apr 8, 2024 • 23 •

1

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263 •

50

commented 2 papers 4 months ago

Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions

Paper • 2505.00675 • Published May 1, 2025 • 3 •

1

Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

Paper • 2508.09874 • Published Aug 13, 2025 • 10 •

2

commented 3 papers 5 months ago

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7, 2025 • 180 •

21

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5, 2025 • 133 •

22

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 158 •

9

commented 2 papers 6 months ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263 •

50

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30, 2025 • 277 •

9

commented 2 papers 7 months ago

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6, 2025 • 188 •

9

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6, 2025 • 188 •

9

commented a paper 8 months ago

Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21, 2025 • 88 •

6

beiqing

AI & ML interests

Organizations

StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Reinforcement Pre-Training

Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions

Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Agentic Reinforced Policy Optimization

Reinforcement Pre-Training

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Learning to Reason under Off-Policy Guidance

beiqing

AI & ML interests

Organizations

zhangBeiQing's activity