Instructions to use Lakshan2003/SmolLM3-3B-instruct-customerservice with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Lakshan2003/SmolLM3-3B-instruct-customerservice with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B") model = PeftModel.from_pretrained(base_model, "Lakshan2003/SmolLM3-3B-instruct-customerservice") - Transformers
How to use Lakshan2003/SmolLM3-3B-instruct-customerservice with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Lakshan2003/SmolLM3-3B-instruct-customerservice") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Lakshan2003/SmolLM3-3B-instruct-customerservice", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Lakshan2003/SmolLM3-3B-instruct-customerservice with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Lakshan2003/SmolLM3-3B-instruct-customerservice" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Lakshan2003/SmolLM3-3B-instruct-customerservice", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Lakshan2003/SmolLM3-3B-instruct-customerservice
- SGLang
How to use Lakshan2003/SmolLM3-3B-instruct-customerservice with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Lakshan2003/SmolLM3-3B-instruct-customerservice" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Lakshan2003/SmolLM3-3B-instruct-customerservice", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Lakshan2003/SmolLM3-3B-instruct-customerservice" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Lakshan2003/SmolLM3-3B-instruct-customerservice", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use Lakshan2003/SmolLM3-3B-instruct-customerservice with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Lakshan2003/SmolLM3-3B-instruct-customerservice to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Lakshan2003/SmolLM3-3B-instruct-customerservice to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Lakshan2003/SmolLM3-3B-instruct-customerservice to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Lakshan2003/SmolLM3-3B-instruct-customerservice", max_seq_length=2048, ) - Docker Model Runner
How to use Lakshan2003/SmolLM3-3B-instruct-customerservice with Docker Model Runner:
docker model run hf.co/Lakshan2003/SmolLM3-3B-instruct-customerservice
SmolLM3-3B-instruct-customerservice
This model is a QLoRA fine-tuned version of HuggingFaceTB/SmolLM3-3B-Instruct on a context-summarized multi-turn customer-service QA dataset for banking domain conversations.
Model Description
This is a QLoRA (Quantized Low-Rank Adaptation) fine-tuned version of SmolLM3-3B-Instruct optimized for multi-turn customer-service question answering with context summarization. The model was trained on synthetic banking customer-service conversations with history summarization to preserve essential conversational context while maintaining dialogue continuity.
Base Model: HuggingFaceTB/SmolLM3-3B-Instruct
Parameters: ~3 billion
Fine-tuning Method: QLoRA (4-bit quantization + LoRA)
Domain: Customer Service (Banking)
Task: Context-Summarized Multi-Turn Question Answering
Note: Reasoning capabilities disabled during training and inference (no thinking tags)
Intended Uses & Limitations
Intended Uses
- Multi-turn customer service conversations in banking domain
- Context-aware response generation with dialogue continuity
- Real-time customer support automation
- Efficient deployment on resource-constrained hardware
- Privacy-preserving on-premise deployment
Limitations
- Primarily trained on banking domain data; may require adaptation for other sectors
- Performance based on synthetic data; real-world variability may differ
- Requires context summarization for optimal performance
- Maximum sequence length: 512 tokens
- Lower performance compared to other 3B models (LLaMA, Qwen, Phi)
- Struggles with dialogue continuity and contextual alignment
Training Data
Dataset: Synthetic context-summarized multi-turn customer-service QA dataset
Source: Derived from TalkMap Banking Conversation Corpus
Size: 128,335 training instances, 18,333 validation instances
Conversation Turns: 2-53 turns per conversation (avg: 10.06)
Context Strategy: History summarization using GPT-4o-mini
Response Refinement: GPT-4.1-based response quality enhancement
Training Procedure
Training Configuration
- Framework: Unsloth + Hugging Face Transformers
- Fine-tuning Method: QLoRA (4-bit quantization)
- Hardware: NVIDIA RTX A100 40GB GPU
- Training Time: 5-14 hours
Training Hyperparameters
- Max Sequence Length: 512 tokens
- Quantization: 4-bit precision
- LoRA Rank (r): 16
- LoRA Alpha: 32
- LoRA Dropout: 0.1
- LoRA Target Modules: All attention and feed-forward projection layers
- Epochs: 3
- Optimizer: AdamW 8-bit
- Learning Rate: 2e-5
- Weight Decay: 0.01
- Warmup Ratio: 0.05
- LR Scheduler: Cosine
Inference Parameters
generation_config = {
"max_new_tokens": 128,
"temperature": 0.6,
"do_sample": True,
"top_p": 0.95,
"top_k": 50,
}
Usage Example
Installation
pip install unsloth transformers peft torch
Loading the Model
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"HuggingFaceTB/SmolLM3-3B-Instruct",
device_map="auto",
torch_dtype=torch.float16,
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B-Instruct")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Lakshan2003/SmolLM3-3B-instruct-customerservice")
# Merge adapter (optional, for deployment)
model = model.merge_and_unload()
model.eval()
Inference
# Prompt template (adjust for SmolLM format)
prompt_template = """<|im_start|>system
{instruction}<|im_end|>
<|im_start|>user
Conversation History:
{history}
Client Question:
{client_question}<|im_end|>
<|im_start|>assistant
"""
# Example conversation
instruction = "You are a professional call-center customer service agent working at Optimal Financial Partners. Review the conversation history and any provided context (if available). Make sure your response is consistent with the conversation history (names, issues, and actions already taken). If no history is given, treat the client’s message as the start of the conversation. Continue the dialogue as the agent by giving a clear, helpful, and professional response. Responses should sound natural and human-like, like a real phone call, and usually be few short sentences. Provide more detail when the client’s request clearly requires it."
history = "Kathrine has contacted Almira from Optimal Financial Partners regarding unexpected charges on her statement and her rights as a consumer. Almira confirmed that as a customer, Kathrine has the right to dispute any unauthorized or incorrect charges. Almira offered to investigate any charges Kathrine believes are incorrect. No specific charges, amounts, or account identifiers have been mentioned, and no verification steps have been completed or are pending at this time. The conversation is currently focused on explaining consumer rights and the process for disputing charges."
client_question = "That's great to know. What if I'm not satisfied with the outcome of the investigation?"
# Format input
input_text = prompt_template.format(
instruction=instruction,
history=history,
client_question=client_question
)
# Tokenize
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=512).to(model.device)
# Generate
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=128,
temperature=0.6,
do_sample=True,
top_p=0.95,
top_k=50,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id,
)
# Decode response
input_length = inputs.input_ids.shape[1]
response = tokenizer.decode(outputs[0][input_length:], skip_special_tokens=True).strip()
print(response)
Framework Versions
- PEFT: 0.14.0
- Transformers: 4.47.0
- PyTorch: 2.5.1+cu121
- Unsloth: Latest (training framework)
Citation
If you use this model, please cite:
@article{cooray2026small,
title={Can Small Language Models Handle Context-Summarized Multi-Turn Customer-Service QA? A Synthetic Data-Driven Comparative Evaluation},
author={Cooray, Lakshan and Sumanathilaka, Deshan and Raju, Pattigadapa Venkatesh},
journal={arXiv preprint arXiv:2602.00665},
year={2026}
}
Model Card Contact
Author: Lakshan Cooray
Institution: Informatics Institute of Technology, Colombo, Sri Lanka
Email: lakshan.20221470@iit.ac.lk
License
This model inherits the license from the base SmolLM3-3B-Instruct model. Please refer to Hugging Face's license agreement.
Ethical Considerations
- Model trained on synthetic banking data to preserve privacy
- Should be used with human oversight in production environments
- May require domain adaptation for non-banking customer service
- Performance may vary on real-world data with different distributions
- Lower performance suggests need for careful evaluation before deployment
- Consider alternative models for production customer-service applications
- Downloads last month
- 1