TMLR Group

university

https://bhanml.github.io/group.html

AI & ML interests

Trustworthy Machine Learning and Reasoning

Recent Activity

Aboriginer authored a paper about 12 hours ago

AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning

Aboriginer authored a paper about 12 hours ago

Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check

Aboriginer authored a paper about 15 hours ago

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

View all activity

authored 2 papers about 12 hours ago

AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning

Paper • 2510.06261 • Published Oct 5, 2025 • 6

Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check

Paper • 2509.11629 • Published Sep 15, 2025 • 1

authored 2 papers about 15 hours ago

Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

Paper • 2406.00806 • Published Jun 2, 2024

Noisy Test-Time Adaptation in Vision-Language Models

Paper • 2502.14604 • Published Feb 20, 2025

updated a model 3 months ago

TMLR-Group-HF/Co-rewarding-III-Llama-3.2-3B-Instruct-DAPO14k

4B • Updated Dec 21, 2025 • 2

updated a collection 3 months ago

Co-rewarding

Co-rewarding is a novel self-supervised RL framework that improves training stability by seeking complementary supervision from another views. • 75 items • Updated Dec 21, 2025 • 1

published a model 3 months ago

TMLR-Group-HF/Co-rewarding-III-Llama-3.2-3B-Instruct-DAPO14k

4B • Updated Dec 21, 2025 • 2

updated a model 3 months ago

TMLR-Group-HF/Co-rewarding-III-Qwen3-4B-Base-DAPO14k

4B • Updated Dec 21, 2025

published a model 3 months ago

TMLR-Group-HF/Co-rewarding-III-Qwen3-4B-Base-DAPO14k

4B • Updated Dec 21, 2025

updated a model 3 months ago

TMLR-Group-HF/Co-rewarding-III-Qwen3-8B-Base-DAPO14k

8B • Updated Dec 21, 2025 • 1

updated a collection 3 months ago

Co-rewarding

Co-rewarding is a novel self-supervised RL framework that improves training stability by seeking complementary supervision from another views. • 75 items • Updated Dec 21, 2025 • 1

published a model 3 months ago

TMLR-Group-HF/Co-rewarding-III-Qwen3-8B-Base-DAPO14k

8B • Updated Dec 21, 2025 • 1

updated a model 3 months ago

TMLR-Group-HF/Co-rewarding-III-Llama-3.2-3B-Instruct-MATH

4B • Updated Dec 21, 2025 • 2

published a model 3 months ago

TMLR-Group-HF/Co-rewarding-III-Llama-3.2-3B-Instruct-MATH

4B • Updated Dec 21, 2025 • 2

updated a model 3 months ago

TMLR-Group-HF/Co-rewarding-III-Qwen3-4B-Base-MATH

4B • Updated Dec 21, 2025 • 2 • 1

published a model 3 months ago

TMLR-Group-HF/Co-rewarding-III-Qwen3-4B-Base-MATH

4B • Updated Dec 21, 2025 • 2 • 1