Instructions to use alphaedge-ai/SmolLM3-3B-ita-32768 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use alphaedge-ai/SmolLM3-3B-ita-32768 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="alphaedge-ai/SmolLM3-3B-ita-32768") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("alphaedge-ai/SmolLM3-3B-ita-32768") model = AutoModelForCausalLM.from_pretrained("alphaedge-ai/SmolLM3-3B-ita-32768") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use alphaedge-ai/SmolLM3-3B-ita-32768 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "alphaedge-ai/SmolLM3-3B-ita-32768" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alphaedge-ai/SmolLM3-3B-ita-32768", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/alphaedge-ai/SmolLM3-3B-ita-32768
- SGLang
How to use alphaedge-ai/SmolLM3-3B-ita-32768 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "alphaedge-ai/SmolLM3-3B-ita-32768" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alphaedge-ai/SmolLM3-3B-ita-32768", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "alphaedge-ai/SmolLM3-3B-ita-32768" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alphaedge-ai/SmolLM3-3B-ita-32768", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use alphaedge-ai/SmolLM3-3B-ita-32768 with Docker Model Runner:
docker model run hf.co/alphaedge-ai/SmolLM3-3B-ita-32768
SmolLM3-3B-ita-32768
This model is a 6.36% smaller version of HuggingFaceTB/SmolLM3-3B optimized for Italian language via vocabulary size reduction using the trimming method.
This trimmed model should perform similarly to the original model with only 32,768 tokens and a much smaller memory footprint. However, it may not perform well for other languages as tokens not commonly used in the selected languages were removed from the vocabulary.
Model Statistics
| Metric | Original | Trimmed | Reduction |
|---|---|---|---|
| Vocabulary size | 128,256 tokens | 32,768 tokens | 74.45% |
| Model size | 3,075,098,624 params | 2,879,539,200 params | 6.36% |
Mining Dataset Statistics
- Number of texts used for mining: 200,000 texts
- Dataset: lbourdois/fineweb-2-trimming
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "alphaedge-ai/SmolLM3-3B-ita-32768"
device = "cuda" # for GPU usage or "cpu" for CPU usage
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
).to(device)
# prepare the model input
prompt = "Your prompt in Italian."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Generate the output
generated_ids = model.generate(**model_inputs, max_new_tokens=32768)
# Get and decode the output
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
print(tokenizer.decode(output_ids, skip_special_tokens=True))
To enable/disable thinking mode, use the /think or /no_think flag in the system prompt:
messages = [
{"role": "system", "content": "/no_think"},
{"role": "user", "content": prompt}
]
Citations
SmolLM3-3B
@misc{bakouch2025smollm3,
title={SmolLM3: smol, multilingual, long-context reasoner},
author={akouch, Elie and Ben Allal, Loubna and Lozhkov, Anton and Tazi, Nouamane
and Tunstall, Lewis and Patiño, Carlos Miguel and Beeching, Edward
and Roucher, Aymeric and others},
year={2025},
howpublished={https://huggingface.co/blog/smollm3}
}
Trimming blog post
@misc{hf_blogpost_trimming,
title={Introduction to Trimming},
author={Loïck BOURDOIS and Tom AARSEN and Bram VANROY and Christopher AKIKI and Woojun JUNG and Manuel ROMERO and Prithiv SAKTHI},
year={2026},
url={https://huggingface.co/blog/lbourdois/introduction-to-trimming},
}
- Downloads last month
- 26
Model tree for alphaedge-ai/SmolLM3-3B-ita-32768
Base model
HuggingFaceTB/SmolLM3-3B-Base