ASTRAI Pluto Nano 0.5 โ€” BASE

Pre-identity / pre-final-preference checkpoint of Pluto Nano 0.5.

This is the v11 checkpoint before identity SFT, ORPO, and KTO-math. Use this as the starting point if you want to fine-tune your own identity, style or preference on top of Pluto Nano.

For the production-aligned model, use pluto-nano-0.5.

Architecture

  • 1 B total / ~50 M active per token (35 experts, top-1 MoE)
  • GQA 6 query / 2 KV heads
  • 16 layers, hidden 384, expert intermediate 1536
  • Tokenizer: custom 32 k BPE
  • Languages: EN, PT, ES, ZH, HI
  • Context: 4096

Training

  • Pretrain: 13 B tokens multilingual
  • Distill v1/v2 (frontier models)
  • Recovery CPT + Wikipedia knowledge boost
  • Second Distill (e1 best): reasoning + chat + QA + replay buffer, 30 M tokens
  • Trained entirely on RTX 3060 12 GB
  • Total wall-clock: ~2 weeks

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("ASTRAI-labs/pluto-nano-0.5-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "ASTRAI-labs/pluto-nano-0.5-base",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).cuda()

License

ASTRAI Closed License. See pluto-nano-0.5 for full terms.

Downloads last month
15
Safetensors
Model size
1B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ASTRAI-labs/pluto-nano-0.5-base

Finetunes
1 model