Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Alanox 's Collections
LLM Evaluation Benchmarks

LLM Evaluation Benchmarks

updated Apr 7, 2025

This collection is here is make references to the evaluation benchmarks we see in traditional LLM papers

Upvote
-

  • Running on CPU Upgrade
    241

    MMLU-Pro Leaderboard

    🥇
    241

    More advanced and challenging multi-task evaluation


  • Running on CPU Upgrade
    584

    GAIA Leaderboard

    🦾
    584

    Submit model results and view GAIA benchmark leaderboard

Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs