🇩🇪 nanochat German: v1

nanochat German logo

This repository hosts the first German nanochat model. It was fine-tuned (mid-training phase) on various German SFT datasets.

💬 A demo space of the model can be found here.

Datasets

The chat model was fine-tuned on the following datasets:

More information can be found in the corresponding German nanochat repository.

Fine-Tuning Stats

run: nanochat-german
device_type:
dtype: bfloat16
num_iterations: -1
max_seq_len: 2048
device_batch_size: 32
unembedding_lr: 0.0040
embedding_lr: 0.2000
matrix_lr: 0.0200
init_lr_frac: 1.0000
weight_decay: 0.0000
eval_every: 150
eval_tokens: 10,485,760
total_batch_size: 524,288
dry_run: 0
Number of iterations: 346
DDP world size: 8
Minimum validation bpb: 0.6001

Evaluation Results

We use lm_eval to measure and compare the model's performance against other language models in the same parameter range (note: this list is not exhaustive):

Model	arc_de		hellaswag_de		m_mmlu_de	truthfulqa_de_mc1	truthfulqa_de_mc2
	acc	acc_norm	acc	acc_norm	acc	acc	acc
nanochat German v1	0.2241	0.2626	0.3203	0.3581	0.2285	0.2500	0.4184
LLäMmlein-120M	0.1942	0.2301	0.2945	0.3178	0.2285	0.2310	0.4055
LLäMmlein-1B	0.2515	0.2960	0.3703	0.4490	0.2317	0.2322	0.3617

Command that was used to retrieve evaluation results - using our model:

lm_eval --model hf \
--model_args pretrained="stefan-it/nanochat-german-v1" \
--tasks "arc_de,hellaswag_de,m_mmlu_de,truthfulqa_de_mc1,truthfulqa_de_mc2" \
--device cuda:0 \
--batch_size auto \
--trust_remote_code \
--log_samples \
--output_path ./nanochat-german-v1

Demo

To generate some text, please make sure that you are using this specific HF branch.

Then the following code can be used:

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline


model_id = "stefan-it/nanochat-german-v1"
revision = "main"
max_new_tokens = 64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False, revision=revision)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16, revision=revision).to(device)
model.eval()

conversation = [
    {"role": "user", "content": "Was ist die Hauptstadt von Bayern?"},
]

inputs = tokenizer.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt"
).to(device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
    )

# Decode only the generated tokens (excluding the input prompt)
generated_tokens = outputs[0, inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))

License

The model is licences under a permissive Apache 2.0 license.

Acknowledgements

Many thanks to Andrej Karpathy's original nanochat repo!
Thanks to the LLäMmlein team for making the pretraining data publicly available.
Thanks to Ben and Joshua for help and working on the nanochat HF integration.

Downloads last month: 17

Safetensors

Model size

0.6B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for stefan-it/nanochat-german-v1

Base model

stefan-it/nanochat-german-base

Finetuned

(1)

this model

stefan-it
/

nanochat-german-v1