SpectralPO

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

PeterLauLukCh authored a paper 7 days ago

Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

PeterLauLukCh authored a paper 7 days ago

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

PeterLauLukCh submitted a paper 14 days ago

Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

View all activity

PeterLauLukCh

authored 2 papers 7 days ago

Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Paper • 2512.16912 • Published 14 days ago • 10

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Paper • 2512.19682 • Published 10 days ago • 15

PeterLauLukCh

submitted a paper to Daily Papers 14 days ago

Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Paper • 2512.16912 • Published 14 days ago • 10

ziniuli

authored 10 papers about 2 months ago

Why Transformers Need Adam: A Hessian Perspective

Paper • 2402.16788 • Published Feb 26, 2024 • 2

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

Paper • 2310.10505 • Published Oct 16, 2023 • 3

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24, 2025 • 80

Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving

Paper • 2508.09099 • Published Aug 12, 2025

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment

Paper • 2505.04113 • Published May 7, 2025

Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Paper • 2509.25849 • Published Sep 30, 2025 • 47

Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published Oct 29, 2025 • 221

PeterLauLukCh

updated a Space 6 months ago

README

🏢

PeterLauLukCh

updated a collection 6 months ago

DeepSeek-R1-Distill-Qwen-32B

Collection

2 items • Updated Jul 19, 2025

PeterLauLukCh

updated a model 6 months ago

SpectralPO/DeepSeek-R1-Distill-Qwen-7B-SPO-QwQ-Ablation

8B • Updated Jul 19, 2025 • 9

PeterLauLukCh

published 3 models 6 months ago

SpectralPO/DeepSeek-R1-Distill-Qwen-32B-GRPO

Updated Jul 19, 2025

SpectralPO/DeepSeek-R1-Distill-Qwen-32B-SPO

Updated Jul 19, 2025

SpectralPO/DeepSeek-R1-Distill-Qwen-7B-SPO-QwQ-Ablation

8B • Updated Jul 19, 2025 • 9

AI & ML interests

Recent Activity

Team members 3

SpectralPO's activity

README