Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Ujjwal-Tyagi 's Collections
Distillation Datasets
Coding Datasets
Best Small LLMs for finetuning

Coding Datasets

updated 6 days ago

These are the best coding corpuses to make the LLM more stronger to surpass proprietary ones, basically it can be used in both post and pre training.

Upvote
1

  • Ujjwal-Tyagi/gitee

    Viewer • Updated 6 days ago • 819M • 95

  • Ujjwal-Tyagi/gitverse

    Viewer • Updated 6 days ago • 2.8M • 20

  • Ujjwal-Tyagi/jihulab

    Viewer • Updated 6 days ago • 1.85M • 19

  • Ujjwal-Tyagi/moshub

    Updated 6 days ago • 20

  • Ujjwal-Tyagi/gitflic

    Viewer • Updated 6 days ago • 5.98M • 26

  • Ujjwal-Tyagi/notabug

    Viewer • Updated 6 days ago • 12.6M • 34

  • Ujjwal-Tyagi/gitgud

    Viewer • Updated 6 days ago • 16.3M • 34

  • Ujjwal-Tyagi/gitcode

    Viewer • Updated 6 days ago • 48.1M • 45

  • Ujjwal-Tyagi/google-code-archive

    Viewer • Updated 6 days ago • 65.8M • 55

  • Ujjwal-Tyagi/Cpp

    Updated 7 days ago • 11

  • Ujjwal-Tyagi/C

    Updated 7 days ago • 11

  • Ujjwal-Tyagi/Python

    Updated 7 days ago • 12

  • Ujjwal-Tyagi/Java-Code-Large

    Viewer • Updated 6 days ago • 10.9M • 414

  • Ujjwal-Tyagi/JavaScript-Code-Large

    Viewer • Updated 6 days ago • 2.64M • 447

  • Ujjwal-Tyagi/PHP-Code-Large

    Viewer • Updated 6 days ago • 8.07M • 125
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs