HeliumLM / README.md

Update README.md

cba4ded verified 3 months ago

3.74 kB

	---
	license: mit
	datasets:
	- roneneldan/TinyStories
	language:
	- en
	metrics:
	- perplexity
	pipeline_tag: text-generation
	tags:
	- slm
	- transformer
	- attention
	- optimization
	- pytorch
	- tinystories
	- educational
	---
	# Model Card for Helium-Nano (45M)

	Helium-Nano is a 45-million parameter Small Language Model (SLM) trained on the TinyStories dataset. It demonstrates how a highly optimized custom Transformer architecture can achieve coherent English storytelling capabilities with minimal compute resources. The model was trained in under 1 hour on a single Nvidia L4 GPU, achieving a throughput of 409k tokens/second via PyTorch 2.0 compile and architectural optimizations.

	## Model Details

	### Model Description

	Helium-Nano is a decoder-only Transformer designed to investigate training dynamics and scaling laws in low-resource environments. Despite its small size, it produces grammatically correct and narratively consistent short stories.

	The primary goal of this model was engineering efficiency. By implementing BFloat16 mixed precision, Flash Attention principles, Torch.compile (Inductor), and Float32-optimized Rotary Embeddings (RoPE), the training pipeline achieved a 16x speedup over standard eager-mode baselines.

	- Developed by: Debmalya/batmanLovesAI
	- Model type: Decoder-only Transformer (Custom Architecture)
	- Language(s) (NLP): English
	- License: MIT
	- Finetuned from model: N/A (Trained from scratch)

	### Model Sources

	- Repository: [Link to Github Repo](https://github.com/DebmalyaSen34/HeliumLM)
	- Dataset Paper: [TinyStories: How Small Can Language Models Be?](https://arxiv.org/abs/2305.07759)
	- Optimization Techniques: [Small Language Models: Architectures, Techniques, Evaluation, Problems and Future Adaptation](https://arxiv.org/abs/2505.19529)

	## Uses

	### Direct Use

	- Story Generation: Generating simple, coherent short stories suitable for early childhood reading levels.
	- Educational: A lightweight baseline for experimenting with model interpretation, quantization, or fine-tuning on consumer hardware.
	- Performance Benchmarking: Testing inference speeds of small transformers on various hardware.

	### Out-of-Scope Use

	- Factual Queries: The model is trained on fiction; it has no world knowledge and will hallucinate facts.
	- Reasoning/Math: The model is not capable of complex logic or arithmetic.
	- Harmful Content: While the dataset is heavily filtered, users should not attempt to generate toxic or biased content.

	## Bias, Risks, and Limitations

	- Dataset Bias: The model reflects the vocabulary and concepts found in the TinyStories dataset, which focuses on simple, positive narratives using a limited vocabulary (approx 3-year-old level).
	- Repetition: Like many SLMs, the model may enter repetitive loops if the temperature is too low or repetition penalty is not applied during inference.
	- Hallucinations: The model prioritizes grammatical structure over semantic logic.

	## How to Get Started with the Model

	Since this uses a custom architecture, you need to instantiate the model class before loading weights.

	```python
	import torch
	from tokenizers import Tokenizer
	# Assuming TinySLM class is defined in your local files

	# 1. Load Tokenizer
	tokenizer = Tokenizer.from_file("tokenizer.json")

	# 2. Initialize Model
	config = {
	"vocab_size": 32000,
	"d_model": 512,
	"n_head": 8,
	"n_layers": 10,
	"max_seq_len": 512
	}
	model = TinySLM(config)

	# 3. Load Weights
	state_dict = torch.load("helium_nano_45m.pt", map_location="cpu")
	model.load_state_dict(state_dict)
	model.eval()

	# 4. Generate
	prompt = "Once upon a time, there was a little"
	# ... inference code ...