Instructions to use Abiray/Sutra-Instruct-350M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Abiray/Sutra-Instruct-350M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Abiray/Sutra-Instruct-350M")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Abiray/Sutra-Instruct-350M", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Abiray/Sutra-Instruct-350M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Abiray/Sutra-Instruct-350M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Abiray/Sutra-Instruct-350M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Abiray/Sutra-Instruct-350M
- SGLang
How to use Abiray/Sutra-Instruct-350M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Abiray/Sutra-Instruct-350M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Abiray/Sutra-Instruct-350M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Abiray/Sutra-Instruct-350M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Abiray/Sutra-Instruct-350M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Abiray/Sutra-Instruct-350M with Docker Model Runner:
docker model run hf.co/Abiray/Sutra-Instruct-350M
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Abiray/Sutra-Instruct-350M", dtype="auto")Sutra-Instruct-350M
Sutra-Instruct-350M is a custom-built, 350-million parameter causal language model trained using nanaGPT architecture.
🧠 Model Architecture & Details
- Architecture: Custom nanoGPT-based Transformer
- Parameter Count: 350M
- Format:
safetensors - Embeddings: Tied (
lm_headandwteshare memory) - Creator: Abhiray
📚 Training Pipeline
This model was not fine-tuned from an existing corporate base model (like Llama or Mistral). Its brain was initialized from absolute zero and trained through a rigorous two-phase pipeline:
Phase 1: Pre-Training (The Foundation) The base logic was built by streaming a highly curated mix of academic and coding datasets:
HuggingFaceFW/fineweb-edu(High-level English and academic structure)open-web-math/open-web-math(Mathematical logic and formatting)bigcode/starcoderdata(Python syntax and code structure)roneneldan/TinyStories(Basic grammar and narrative flow)
Phase 2: Supervised Fine-Tuning (SFT)
Once the model learned how to speak, it was fine-tuned using the yahma/alpaca-cleaned dataset to teach it the standard Instruction: and Response: conversational format.
⚙️ Recommended Generation Settings
Because this is a compact 350M parameter model, standard generation settings may result in looping or wild hallucinations. For the absolute best outputs, use the following configuration:
- Temperature:
0.5 - Top-K:
50 - Repetition Penalty:
1.3 - Max Length:
400-500 - one can use generation_config.json file in repo
⚠️ Limitations & Bias
- Hallucinations: As a 350M parameter model, Sutra does not have the physical parameter count to act as a factual encyclopedia. It will confidently hallucinate historical dates, math solutions, and trivia.
- Coding: While it understands Python syntax and will output beautifully formatted code blocks (thanks to StarCoder), complex logical scripts may fail.
- Best Use Case: Sutra excels at structural formatting, grammar, summarizing provided context, generate short stories, and acting as a lightweight, lightning-fast local testing model.
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Abiray/Sutra-Instruct-350M")