MiniGPT-124M

A 124-million parameter GPT-style language model trained from scratch on a mixture of TinyStories and Wikitext-103, using pure PyTorch. Inspired by Andrej Karpathy's nanoGPT.

Model Details

Property Value
Parameters ~124M
Layers 12
Attention heads 12
Embedding dimension 768
Context length 256 tokens
Vocabulary size 50,257 (GPT-2)
Tokenizer tiktoken β€” gpt2 encoding
Activation GELU
Architecture Decoder-only Transformer

Training

Property Value
Dataset TinyStories + Wikitext-103 (mixed)
Training steps 8,000
Batch size 16 (Γ— 4 gradient accumulation = effective 64)
Learning rate 2e-4 with cosine decay + warmup
Optimizer AdamW (weight decay 0.1)
Precision Mixed precision (AMP)
Hardware 2Γ— NVIDIA T4 GPUs
Training time ~8 hours

Usage

Install dependencies

pip install torch safetensors tiktoken huggingface_hub

Load and generate

import os
os.environ["HF_HUB_DISABLE_SYMLINKS_WARNING"] = "1"   # Windows only

import torch, tiktoken, importlib.util
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

REPO_ID = "sopanm11/Mini-GPT-124M"
device  = "cuda" if torch.cuda.is_available() else "cpu"

# ── 1. Download files ──────────────────────────────────────
model_path    = hf_hub_download(repo_id=REPO_ID, filename="model.safetensors")
modeling_path = hf_hub_download(repo_id=REPO_ID, filename="modelling_mini_gpt.py")

# ── 2. Import model class ──────────────────────────────────
spec   = importlib.util.spec_from_file_location("modelling_mini_gpt", modeling_path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
GPT, GPTConfig = module.GPT, module.GPTConfig

# ── 3. Load weights ───────────────────────────────────────
state_dict = load_file(model_path)
if "token_emb.weight" not in state_dict:
    state_dict["token_emb.weight"] = state_dict["output_head.weight"]
state_dict.pop("output_head.bias", None)

model = GPT(GPTConfig()).to(device)
model.load_state_dict(state_dict)
model.eval()

# ── 4. Generate text ──────────────────────────────────────
enc = tiktoken.get_encoding("gpt2")

def generate(prompt, max_tokens=200, temperature=0.8):
    input_ids = torch.tensor([enc.encode(prompt)], dtype=torch.long, device=device)
    output_ids = model.generate(input_ids, max_new_tokens=max_tokens, temperature=temperature)
    return enc.decode(output_ids[0].tolist())

print(generate("Once upon a time", max_tokens=200, temperature=0.8))

Temperature guide

Temperature Effect
0.2 – 0.5 Conservative, repetitive, more coherent
0.7 – 0.9 Balanced creativity (recommended)
1.0+ Very creative, less coherent

Files in this repo

File Description
model.safetensors Trained model weights
modelling_mini_gpt.py Full model architecture (pure PyTorch)
config.json Hyperparameters used during training
tokenizer_config.json Tokenizer metadata (GPT-2)

Limitations

  • Context window is limited to 256 tokens
  • Trained on a relatively small dataset β€” not suitable for production use
  • May generate incoherent or repetitive text on complex prompts
  • No instruction tuning or RLHF β€” base language model only

License

MIT

Downloads last month
37
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support