Instructions to use gufett0/unsloth-llama3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use gufett0/unsloth-llama3B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="gufett0/unsloth-llama3B",
	filename="unsloth.llama3b.Q4_K_M.smalljson.proposals.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use gufett0/unsloth-llama3B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf gufett0/unsloth-llama3B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf gufett0/unsloth-llama3B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf gufett0/unsloth-llama3B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf gufett0/unsloth-llama3B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf gufett0/unsloth-llama3B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf gufett0/unsloth-llama3B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf gufett0/unsloth-llama3B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf gufett0/unsloth-llama3B:Q4_K_M

Use Docker

docker model run hf.co/gufett0/unsloth-llama3B:Q4_K_M

LM Studio
Jan

vLLM

How to use gufett0/unsloth-llama3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "gufett0/unsloth-llama3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gufett0/unsloth-llama3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/gufett0/unsloth-llama3B:Q4_K_M

Ollama
How to use gufett0/unsloth-llama3B with Ollama:
```
ollama run hf.co/gufett0/unsloth-llama3B:Q4_K_M
```

Unsloth Studio new

How to use gufett0/unsloth-llama3B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for gufett0/unsloth-llama3B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for gufett0/unsloth-llama3B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for gufett0/unsloth-llama3B to start chatting

Pi new

How to use gufett0/unsloth-llama3B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf gufett0/unsloth-llama3B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "gufett0/unsloth-llama3B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use gufett0/unsloth-llama3B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf gufett0/unsloth-llama3B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default gufett0/unsloth-llama3B:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use gufett0/unsloth-llama3B with Docker Model Runner:
```
docker model run hf.co/gufett0/unsloth-llama3B:Q4_K_M
```

Lemonade

How to use gufett0/unsloth-llama3B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull gufett0/unsloth-llama3B:Q4_K_M

Run and chat with the model

lemonade run user.unsloth-llama3B-Q4_K_M

List all available models

lemonade list

llama3b-attribute-inference-q4_k_m

Model Summary

A ~3B parameter Llama 3.x instruction model, further fine-tuned with Unsloth using QLoRA (4-bit adapters) to infer personal attributes from first-person text and output a compact JSON report. The model predicts keys like "age", "occupation", "income_level", "city_country", etc., and for each one gives:

estimate: inferred value
confidence: integer 1–5

If the model cannot infer an attribute with any justification, that attribute is simply omitted from the JSON. :contentReference[oaicite:0]{index=0}

The final checkpoint is merged and exported to GGUF with q4_k_m quantization for CPU-friendly local inference via llama.cpp / node-llama-cpp. :contentReference[oaicite:1]{index=1}

Intended Use

This model is intended for research on privacy and attribute inference: given informal self-descriptive text, estimate likely traits (age, relationship status, education level, etc.) and produce machine-readable output.

This model is not intended for profiling, scoring, surveillance, hiring decisions, or any automated judgment about real people. Predictions are guesses and can be biased or wrong. :contentReference[oaicite:2]{index=2}

Training Data

The model was fine-tuned on a reformatted version of the RobinSta/SynthPAI dataset, which consists of synthetic first-person narratives plus human-reviewed annotations of personal attributes (age, education, relationship status, income band, etc.). The script loads the dataset and performs an 80/20 train/validation split. :contentReference[oaicite:3]{index=3} :contentReference[oaicite:4]{index=4} :contentReference[oaicite:5]{index=5}

Each data point is turned into a chat-style triple:

system: instructions defining which attributes to infer and the required JSON schema
user: the narrative text
assistant: the target JSON (ground truth attributes + confidence)

Only the assistant JSON is used for loss (the trainer masks prompts so the model is optimized to produce just the final JSON answer). :contentReference[oaicite:6]{index=6} :contentReference[oaicite:7]{index=7}

Training Procedure

Base model
unsloth/Llama-3.2-3B-Instruct-bnb-4bit (4-bit loaded). The script also supports an 8B Llama 3.1 variant, but this release uses the ~3B class for smaller memory footprint. :contentReference[oaicite:8]{index=8}

Method
QLoRA / PEFT via Unsloth:

LoRA r = 16
lora_alpha = 16
lora_dropout = 0
target modules include attention and MLP projection layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
gradient checkpointing = "unsloth"
load_in_4bit = True
max_seq_length = 4096 tokens (RoPE scaling handled by Unsloth) :contentReference[oaicite:9]{index=9} :contentReference[oaicite:10]{index=10}

Trainer config (SFTTrainer)

effective batch size ≈ 8 via per_device_train_batch_size=2 and gradient_accumulation_steps=4
max_steps = 200
learning_rate = 1e-4
warmup_steps = 5
optimizer = adamw_8bit
weight_decay = 0.01
cosine LR schedule
eval every 50 steps on the held-out split
bf16/fp16 selected based on hardware support
packing disabled (no sequence packing) :contentReference[oaicite:11]{index=11} :contentReference[oaicite:12]{index=12}

After training, the LoRA adapters were merged into the base weights and exported as a single GGUF (q4_k_m) checkpoint for llama.cpp-compatible inference. :contentReference[oaicite:13]{index=13}

Output Format

The model is optimized to answer only in strict JSON. Example:

{
  "age": {"estimate": 34, "confidence": 2},
  "occupation": {"estimate": "software engineer", "confidence": 1},
  "city_country": {"estimate": "San Francisco, USA", "confidence": 4}
}

Downloads last month: 13

GGUF

Model size

3B params

Architecture

llama

Hardware compatibility

4-bit

Model tree for gufett0/unsloth-llama3B

Base model

meta-llama/Llama-3.2-3B-Instruct

Quantized

unsloth/Llama-3.2-3B-Instruct-bnb-4bit

Adapter

(45)

this model

gufett0
/

unsloth-llama3B