Instructions to use Abiray/Sutra-Instruct-350M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Abiray/Sutra-Instruct-350M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Abiray/Sutra-Instruct-350M")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Abiray/Sutra-Instruct-350M", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Abiray/Sutra-Instruct-350M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Abiray/Sutra-Instruct-350M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Abiray/Sutra-Instruct-350M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Abiray/Sutra-Instruct-350M

SGLang

How to use Abiray/Sutra-Instruct-350M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Abiray/Sutra-Instruct-350M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Abiray/Sutra-Instruct-350M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Abiray/Sutra-Instruct-350M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Abiray/Sutra-Instruct-350M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Abiray/Sutra-Instruct-350M with Docker Model Runner:
```
docker model run hf.co/Abiray/Sutra-Instruct-350M
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Sutra-Instruct-350M

Sutra-Instruct-350M is a custom-built, 350-million parameter causal language model trained using nanaGPT architecture.

🧠 Model Architecture & Details

Architecture: Custom nanoGPT-based Transformer
Parameter Count: 350M
Format: safetensors
Embeddings: Tied (lm_head and wte share memory)
Creator: Abhiray

📚 Training Pipeline

This model was not fine-tuned from an existing corporate base model (like Llama or Mistral). Its brain was initialized from absolute zero and trained through a rigorous two-phase pipeline:

Phase 1: Pre-Training (The Foundation) The base logic was built by streaming a highly curated mix of academic and coding datasets:

HuggingFaceFW/fineweb-edu (High-level English and academic structure)
open-web-math/open-web-math (Mathematical logic and formatting)
bigcode/starcoderdata (Python syntax and code structure)
roneneldan/TinyStories (Basic grammar and narrative flow)

Phase 2: Supervised Fine-Tuning (SFT) Once the model learned how to speak, it was fine-tuned using the yahma/alpaca-cleaned dataset to teach it the standard Instruction: and Response: conversational format.

⚙️ Recommended Generation Settings

Because this is a compact 350M parameter model, standard generation settings may result in looping or wild hallucinations. For the absolute best outputs, use the following configuration:

Temperature: 0.5
Top-K: 50
Repetition Penalty: 1.3
Max Length: 400-500
one can use generation_config.json file in repo

⚠️ Limitations & Bias

Hallucinations: As a 350M parameter model, Sutra does not have the physical parameter count to act as a factual encyclopedia. It will confidently hallucinate historical dates, math solutions, and trivia.
Coding: While it understands Python syntax and will output beautifully formatted code blocks (thanks to StarCoder), complex logical scripts may fail.
Best Use Case: Sutra excels at structural formatting, grammar, summarizing provided context, generate short stories, and acting as a lightweight, lightning-fast local testing model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Abiray/Sutra-Instruct-350M

Quantizations

1 model

Datasets used to train Abiray/Sutra-Instruct-350M

Collection including Abiray/Sutra-Instruct-350M

Sutra Instruct model

Collection

2 items • Updated Mar 17