| | --- |
| | license: mit |
| | datasets: |
| | - roneneldan/TinyStories |
| | language: |
| | - en |
| | metrics: |
| | - perplexity |
| | pipeline_tag: text-generation |
| | tags: |
| | - slm |
| | - transformer |
| | - attention |
| | - optimization |
| | - pytorch |
| | - tinystories |
| | - educational |
| | --- |
| | # Model Card for Helium-Nano (45M) |
| |
|
| | **Helium-Nano** is a 45-million parameter Small Language Model (SLM) trained on the TinyStories dataset. It demonstrates how a highly optimized custom Transformer architecture can achieve coherent English storytelling capabilities with minimal compute resources. The model was trained in under 1 hour on a single Nvidia L4 GPU, achieving a throughput of **409k tokens/second** via PyTorch 2.0 compile and architectural optimizations. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | Helium-Nano is a decoder-only Transformer designed to investigate training dynamics and scaling laws in low-resource environments. Despite its small size, it produces grammatically correct and narratively consistent short stories. |
| |
|
| | The primary goal of this model was engineering efficiency. By implementing **BFloat16 mixed precision**, **Flash Attention principles**, **Torch.compile (Inductor)**, and **Float32-optimized Rotary Embeddings (RoPE)**, the training pipeline achieved a 16x speedup over standard eager-mode baselines. |
| |
|
| | - **Developed by:** Debmalya/batmanLovesAI |
| | - **Model type:** Decoder-only Transformer (Custom Architecture) |
| | - **Language(s) (NLP):** English |
| | - **License:** MIT |
| | - **Finetuned from model:** N/A (Trained from scratch) |
| |
|
| | ### Model Sources |
| |
|
| | - **Repository:** [Link to Github Repo](https://github.com/DebmalyaSen34/HeliumLM) |
| | - **Dataset Paper:** [TinyStories: How Small Can Language Models Be?](https://arxiv.org/abs/2305.07759) |
| | - **Optimization Techniques:** [Small Language Models: Architectures, Techniques, Evaluation, Problems and Future Adaptation](https://arxiv.org/abs/2505.19529) |
| |
|
| | ## Uses |
| |
|
| | ### Direct Use |
| |
|
| | - **Story Generation:** Generating simple, coherent short stories suitable for early childhood reading levels. |
| | - **Educational:** A lightweight baseline for experimenting with model interpretation, quantization, or fine-tuning on consumer hardware. |
| | - **Performance Benchmarking:** Testing inference speeds of small transformers on various hardware. |
| |
|
| | ### Out-of-Scope Use |
| |
|
| | - **Factual Queries:** The model is trained on fiction; it has no world knowledge and will hallucinate facts. |
| | - **Reasoning/Math:** The model is not capable of complex logic or arithmetic. |
| | - **Harmful Content:** While the dataset is heavily filtered, users should not attempt to generate toxic or biased content. |
| |
|
| | ## Bias, Risks, and Limitations |
| |
|
| | - **Dataset Bias:** The model reflects the vocabulary and concepts found in the TinyStories dataset, which focuses on simple, positive narratives using a limited vocabulary (approx 3-year-old level). |
| | - **Repetition:** Like many SLMs, the model may enter repetitive loops if the temperature is too low or repetition penalty is not applied during inference. |
| | - **Hallucinations:** The model prioritizes grammatical structure over semantic logic. |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | Since this uses a custom architecture, you need to instantiate the model class before loading weights. |
| |
|
| | ```python |
| | import torch |
| | from tokenizers import Tokenizer |
| | # Assuming TinySLM class is defined in your local files |
| | |
| | # 1. Load Tokenizer |
| | tokenizer = Tokenizer.from_file("tokenizer.json") |
| | |
| | # 2. Initialize Model |
| | config = { |
| | "vocab_size": 32000, |
| | "d_model": 512, |
| | "n_head": 8, |
| | "n_layers": 10, |
| | "max_seq_len": 512 |
| | } |
| | model = TinySLM(config) |
| | |
| | # 3. Load Weights |
| | state_dict = torch.load("helium_nano_45m.pt", map_location="cpu") |
| | model.load_state_dict(state_dict) |
| | model.eval() |
| | |
| | # 4. Generate |
| | prompt = "Once upon a time, there was a little" |
| | # ... inference code ... |