Instructions to use gufett0/unsloth-llama3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use gufett0/unsloth-llama3B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="gufett0/unsloth-llama3B", filename="unsloth.llama3b.Q4_K_M.smalljson.proposals.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use gufett0/unsloth-llama3B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf gufett0/unsloth-llama3B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf gufett0/unsloth-llama3B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf gufett0/unsloth-llama3B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf gufett0/unsloth-llama3B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf gufett0/unsloth-llama3B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf gufett0/unsloth-llama3B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf gufett0/unsloth-llama3B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf gufett0/unsloth-llama3B:Q4_K_M
Use Docker
docker model run hf.co/gufett0/unsloth-llama3B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use gufett0/unsloth-llama3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "gufett0/unsloth-llama3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "gufett0/unsloth-llama3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/gufett0/unsloth-llama3B:Q4_K_M
- Ollama
How to use gufett0/unsloth-llama3B with Ollama:
ollama run hf.co/gufett0/unsloth-llama3B:Q4_K_M
- Unsloth Studio new
How to use gufett0/unsloth-llama3B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for gufett0/unsloth-llama3B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for gufett0/unsloth-llama3B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for gufett0/unsloth-llama3B to start chatting
- Pi new
How to use gufett0/unsloth-llama3B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf gufett0/unsloth-llama3B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "gufett0/unsloth-llama3B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use gufett0/unsloth-llama3B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf gufett0/unsloth-llama3B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default gufett0/unsloth-llama3B:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use gufett0/unsloth-llama3B with Docker Model Runner:
docker model run hf.co/gufett0/unsloth-llama3B:Q4_K_M
- Lemonade
How to use gufett0/unsloth-llama3B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull gufett0/unsloth-llama3B:Q4_K_M
Run and chat with the model
lemonade run user.unsloth-llama3B-Q4_K_M
List all available models
lemonade list
llama3b-attribute-inference-q4_k_m
Model Summary
A ~3B parameter Llama 3.x instruction model, further fine-tuned with Unsloth using QLoRA (4-bit adapters) to infer personal attributes from first-person text and output a compact JSON report. The model predicts keys like "age", "occupation", "income_level", "city_country", etc., and for each one gives:
estimate: inferred valueconfidence: integer 1–5
If the model cannot infer an attribute with any justification, that attribute is simply omitted from the JSON. :contentReference[oaicite:0]{index=0}
The final checkpoint is merged and exported to GGUF with q4_k_m quantization for CPU-friendly local inference via llama.cpp / node-llama-cpp. :contentReference[oaicite:1]{index=1}
Intended Use
This model is intended for research on privacy and attribute inference: given informal self-descriptive text, estimate likely traits (age, relationship status, education level, etc.) and produce machine-readable output.
This model is not intended for profiling, scoring, surveillance, hiring decisions, or any automated judgment about real people. Predictions are guesses and can be biased or wrong. :contentReference[oaicite:2]{index=2}
Training Data
The model was fine-tuned on a reformatted version of the RobinSta/SynthPAI dataset, which consists of synthetic first-person narratives plus human-reviewed annotations of personal attributes (age, education, relationship status, income band, etc.). The script loads the dataset and performs an 80/20 train/validation split. :contentReference[oaicite:3]{index=3} :contentReference[oaicite:4]{index=4} :contentReference[oaicite:5]{index=5}
Each data point is turned into a chat-style triple:
- system: instructions defining which attributes to infer and the required JSON schema
- user: the narrative text
- assistant: the target JSON (ground truth attributes + confidence)
Only the assistant JSON is used for loss (the trainer masks prompts so the model is optimized to produce just the final JSON answer). :contentReference[oaicite:6]{index=6} :contentReference[oaicite:7]{index=7}
Training Procedure
Base modelunsloth/Llama-3.2-3B-Instruct-bnb-4bit (4-bit loaded). The script also supports an 8B Llama 3.1 variant, but this release uses the ~3B class for smaller memory footprint. :contentReference[oaicite:8]{index=8}
Method
QLoRA / PEFT via Unsloth:
- LoRA r = 16
- lora_alpha = 16
- lora_dropout = 0
- target modules include attention and MLP projection layers (
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj) - gradient checkpointing = "unsloth"
- load_in_4bit = True
- max_seq_length = 4096 tokens (RoPE scaling handled by Unsloth) :contentReference[oaicite:9]{index=9} :contentReference[oaicite:10]{index=10}
Trainer config (SFTTrainer)
- effective batch size ≈ 8 via
per_device_train_batch_size=2andgradient_accumulation_steps=4 - max_steps = 200
- learning_rate = 1e-4
- warmup_steps = 5
- optimizer =
adamw_8bit - weight_decay = 0.01
- cosine LR schedule
- eval every 50 steps on the held-out split
- bf16/fp16 selected based on hardware support
- packing disabled (no sequence packing) :contentReference[oaicite:11]{index=11} :contentReference[oaicite:12]{index=12}
After training, the LoRA adapters were merged into the base weights and exported as a single GGUF (q4_k_m) checkpoint for llama.cpp-compatible inference. :contentReference[oaicite:13]{index=13}
Output Format
The model is optimized to answer only in strict JSON. Example:
{
"age": {"estimate": 34, "confidence": 2},
"occupation": {"estimate": "software engineer", "confidence": 1},
"city_country": {"estimate": "San Francisco, USA", "confidence": 4}
}
- Downloads last month
- 13
4-bit
Model tree for gufett0/unsloth-llama3B
Base model
meta-llama/Llama-3.2-3B-Instruct