LLM Evaluation Benchmarks - a Alanox Collection

Alanox 's Collections

LLM Evaluation Benchmarks

LLM Evaluation Benchmarks

updated Apr 7, 2025

This collection is here is make references to the evaluation benchmarks we see in traditional LLM papers

Running on CPU Upgrade

241

MMLU-Pro Leaderboard

🥇

241

More advanced and challenging multi-task evaluation
Running on CPU Upgrade

584

GAIA Leaderboard

🦾

584

Submit model results and view GAIA benchmark leaderboard