PyTorch
ONNX
Safetensors
English
nanochat
Eval Results (legacy)

NanoChat SFT

This is the the checkpoint from Andrej Karpathy's fullstack llm project to build an LLM, nanochat.

Usage

Install transformers from this specific branch:

pip install git+https://github.com/huggingface/transformers.git@nanochat-implementation

Then, you can run this inference snippet:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer


model_id="nanochat-students/d20-chat-transformers"
max_new_tokens=64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16).to(device)
model.eval()

conversation = [
    {"role": "user", "content": "What is the capital of France?"},
]

inputs = tokenizer.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt"
).to(device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
    )

# Decode only the generated tokens (excluding the input prompt)
generated_tokens = outputs[0, inputs.input_ids.shape[1]:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))

vLLM Integration:

You can also run the model in vLLM, using the above branch install:

vllm serve nanochat-students/nanochat-d20 --enforce-eager

And then you can call the model like so:

url http://localhost:8000/v1/completions \
>   -H "Content-Type: application/json" \
>   -d '{"model": "nanochat-students/nanochat-d20", "prompt": "What is the capital of France?, "max_tokens": 7, "temperature": 0}'

Chat SFT Training Metrics

timestamp: 2025-10-14 20:17:42

  • run:
  • source: mid
  • dtype: bfloat16
  • device_batch_size: 4
  • num_epochs: 1
  • max_iterations: -1
  • target_examples_per_step: 32
  • unembedding_lr: 0.0040
  • embedding_lr: 0.2000
  • matrix_lr: 0.0200
  • weight_decay: 0.0000
  • init_lr_frac: 0.0200
  • eval_every: 100
  • eval_steps: 100
  • eval_metrics_every: 200
  • Training rows: 20,843
  • Number of iterations: 651
  • Training loss: 1.1904
  • Validation loss: 1.0664

Chat evaluation sft

timestamp: 2025-10-14 20:29:59

  • source: sft
  • task_name: None
  • dtype: bfloat16
  • temperature: 0.0000
  • max_new_tokens: 512
  • num_samples: 1
  • top_k: 50
  • batch_size: 8
  • model_tag: None
  • step: None
  • max_problems: None
  • ARC-Easy: 0.4259
  • ARC-Challenge: 0.2961
  • MMLU: 0.3250
  • GSM8K: 0.0432
  • HumanEval: 0.0549
  • ChatCORE metric: 0.0988

Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio

Downloads last month
1,119
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train nanochat-students/nanochat-d20

Space using nanochat-students/nanochat-d20 1

Evaluation results