Sergio Paniego's picture

Building on HF

Sergio Paniego PRO

sergiopaniego

huggingface

·

https://sergiopaniego.github.io/

AI & ML interests

None yet

Recent Activity

updated a dataset about 5 hours ago

agents-course/final-certificates

updated a dataset about 5 hours ago

agents-course/course-certificates-of-excellence

updated a dataset about 5 hours ago

agents-course/final-certificates

View all activity

Organizations

Posts 68

Post

532

TRL v0.27.0 is out!! 🥳

It includes GDPO, the latest variant of GRPO for multi-reward RL ✨
GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence — developed by
@sliuau @SimonX et al.

Explore the paper: GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization (2601.05242)

Explore the full set of changes here:
https://github.com/huggingface/trl/releases/tag/v0.27.0

Articles 11

Article

114

Tokenization in Transformers v5: Simpler, Clearer, and More Modular

View all Articles

Collections 8

View 8 collections

spaces 71

VLM Object Understanding

Explore object detection, visual grounding, keypoint Detecti

Qwen2-VL-7B

Ask questions about charts in images

SmolVLM-trl-dpo-rlaif-v

Generate text from an image and question

SmolVLM-trl-sft-ChartQA

Ask questions about charts in images

Qwen2.5 0.5B Instruct GRPO Rollout

Echo Environment Server

models 113

sergiopaniego/browsergym-grpo-functiongemma-270m-it-test

Updated 9 days ago

sergiopaniego/wordle-grpo-Qwen3-1.7B-test

Updated 9 days ago

sergiopaniego/sudoku-grpo-qwen3

Text Generation • 2B • Updated 16 days ago • 7

sergiopaniego/test-browsergym-grpo-functiongemma-270m-it

Updated 26 days ago

sergiopaniego/t4-Qwen2-7B-Instruct-GRPO

Updated 30 days ago

sergiopaniego/test-t4-Qwen2-7B-Instruct-GRPO

Updated about 1 month ago

sergiopaniego/gemma-3-4b-it-GRPO

Updated about 1 month ago

sergiopaniego/gemma-3n-E2B-it-GRPO

Updated about 1 month ago

sergiopaniego/Llama-3.2-3B-Instruct-GRPO

Updated about 1 month ago

sergiopaniego/Llama-3.1-8B-Instruct-GRPO

Updated about 1 month ago

View 113 models

datasets 6

sergiopaniego/browsergym-grpo-functiongemma-270m-it-dataset

Viewer • Updated 3 minutes ago • 105 • 13.8k

sergiopaniego/sample_videos

Viewer • Updated Jun 30, 2025 • 2 • 13

sergiopaniego/difficult_prompts

Viewer • Updated Jun 20, 2025 • 38 • 17

sergiopaniego/ourworldindata_example

Viewer • Updated Dec 2, 2024 • 13 • 35 • 1

sergiopaniego/faiss_embeddings

Updated Oct 3, 2024 • 6

sergiopaniego/CarlaFollowLanePreviousV

Viewer • Updated Sep 6, 2023 • 59.6k • 29