Instructions to use haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M")
model = AutoModelForCausalLM.from_pretrained("haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M

SGLang

How to use haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M with Docker Model Runner:
```
docker model run hf.co/haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M
```

haykgrigorian/v2mini-eval1: Llama-Architecture 318M Model

Model Overview

v2mini-eval1 model, trained from scratch on 15GB of 1800-1875 london texts using the modern Llama architecture. This model was trained for v2's dataset evaluation.

Detail	Value
Model Architecture	LlamaForCausalLM (Decoder-Only Transformer)
Parameter Count	~318 Million (318M)
Training Type	Trained from Scratch (Random Initialization)
Tokenizer	Custom BPE, Vocab Size 32,000
Sequence Length	1024 tokens
Attention Type	Grouped Query Attention (GQA)

Configuration Details

This model is a custom size and configuration based on Llama:

Parameter	Value
Number of Layers	20
Hidden Size (d)	1024
Intermediate Size ($\text{d}_{\text{ff}}$)	2752
Attention Heads	16 (Query) / 8 (Key/Value)
Activation Function	SiLU (`silu`)
Normalization	RMS Norm (`rms_norm_eps`: 1e-05)
Position Embeddings	Rotary Positional Embeddings (RoPE)

Model Issues

This is an evaluation model, it was trained from scratch using a 15GB sample from a 90GB dataset for 10k steps. There was a tokenization issue and output comes out like this:

default: "D oes that work more of h ise x cell ent st ir ring , in his pl ays"
fixed: "Does that work more of his excellent stirring, in his plays"

This is just a tokenizer issue, just fix the output yourself or if you're lazy feed it to an LLM and have it fixed.

How to Load and Run the Model

Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts

Test script

A run file for testing and evaluating this model is available on the main project repository:

Test Script Link: test_v2mini_eval1.py on GitHub

Downloads last month: 31

Safetensors

Model size

0.3B params

Tensor type

F32

Dataset used to train haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M

Collection including haykgrigorian/TimeCapsuleLLM-v2mini-eval1-llama-300M

TimeCapsuleLLM 1800-1875 London

Collection

A series of language models trained from scratch on historical english texts from London between 1800-1875 • 7 items • Updated Jan 13