Instructions to use webbigdata/Qwen3-0.6B_WBD with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use webbigdata/Qwen3-0.6B_WBD with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="webbigdata/Qwen3-0.6B_WBD",
	filename="Q8_0-00001-of-00002.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use webbigdata/Qwen3-0.6B_WBD with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf webbigdata/Qwen3-0.6B_WBD:Q8_0
# Run inference directly in the terminal:
llama-cli -hf webbigdata/Qwen3-0.6B_WBD:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf webbigdata/Qwen3-0.6B_WBD:Q8_0
# Run inference directly in the terminal:
llama-cli -hf webbigdata/Qwen3-0.6B_WBD:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf webbigdata/Qwen3-0.6B_WBD:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf webbigdata/Qwen3-0.6B_WBD:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf webbigdata/Qwen3-0.6B_WBD:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf webbigdata/Qwen3-0.6B_WBD:Q8_0

Use Docker

docker model run hf.co/webbigdata/Qwen3-0.6B_WBD:Q8_0

LM Studio
Jan

vLLM

How to use webbigdata/Qwen3-0.6B_WBD with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "webbigdata/Qwen3-0.6B_WBD"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "webbigdata/Qwen3-0.6B_WBD",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/webbigdata/Qwen3-0.6B_WBD:Q8_0

Ollama
How to use webbigdata/Qwen3-0.6B_WBD with Ollama:
```
ollama run hf.co/webbigdata/Qwen3-0.6B_WBD:Q8_0
```

Unsloth Studio

How to use webbigdata/Qwen3-0.6B_WBD with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for webbigdata/Qwen3-0.6B_WBD to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for webbigdata/Qwen3-0.6B_WBD to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for webbigdata/Qwen3-0.6B_WBD to start chatting

How to use webbigdata/Qwen3-0.6B_WBD with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf webbigdata/Qwen3-0.6B_WBD:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "webbigdata/Qwen3-0.6B_WBD:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use webbigdata/Qwen3-0.6B_WBD with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf webbigdata/Qwen3-0.6B_WBD:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default webbigdata/Qwen3-0.6B_WBD:Q8_0

Run Hermes

hermes

Docker Model Runner
How to use webbigdata/Qwen3-0.6B_WBD with Docker Model Runner:
```
docker model run hf.co/webbigdata/Qwen3-0.6B_WBD:Q8_0
```

Lemonade

How to use webbigdata/Qwen3-0.6B_WBD with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull webbigdata/Qwen3-0.6B_WBD:Q8_0

Run and chat with the model

lemonade run user.Qwen3-0.6B_WBD-Q8_0

List all available models

lemonade list

dahara1 commited on Feb 21

Commit

fcaa81c

verified ·

1 Parent(s): 725cc85

Update README.md

Browse files

Files changed (1) hide show

README.md +187 -3

README.md CHANGED Viewed

@@ -1,3 +1,187 @@
----
-license: apache-2.0
----

+---
+language:
+- ja
+- en
+license: apache-2.0
+base_model: Qwen/Qwen3-0.6B
+tags:
+- japanese
+- continual-learning
+- sft
+- rl
+- quantized
+- llama.cpp
+- browser
+pipeline_tag: text-generation
+---
+# webbigdata/Qwen3-0.6B_WBD
+Qwen3-0.6Bに継続学習を行い、日本語能力・推論能力・日常会話能力を強化した軽量日本語モデルです。
+ブラウザやスマートフォン、エッジデバイスでの動作を主な目標として開発されました。
+A lightweight Japanese-enhanced model based on Qwen3-0.6B with improved Japanese language ability, reasoning, and conversational capability.
+It was developed with the primary goal of running on browsers, smartphones, and edge devices.
+---
+## ニュース / News
+- **ブラウザデモ公開** ブラウザ上で完全動作するデモページを公開しています → **[webbigdata SLM Demo](https://webbigdata.jp/slm/)**
+- **スマートフォン動作版公開** executorchを使った4bit量子化スマートフォン動作版を公開 → [dahara1/Qwen3-0.6B-executorch-jp](https://huggingface.co/dahara1/Qwen3-0.6B-executorch-jp)
+---
+## モデル概要 / Model Overview
+| 項目 | 内容 |
+|---|---|
+| ベースモデル / Base Model | [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) |
+| パラメータ数 / Parameters | 約6億 (0.6B) |
+| ライセンス / License | Apache 2.0 |
+| 対応言語 / Languages | 日本語・英語 (Japanese / English) |
+| 学習手法 / Training | SFT、RL、8bit量子化 |
+| 開発者 / Developer | dahara1@webbigdata |
+---
+## 特徴 / Features
+- **日本語能力の底上げ**：独自データによる継続学習により、日本語の語彙・知識・表現力を強化
+- **推論能力の強化**：RLを用いた学習により、論理的な推論能力を向上
+- **日本語日常会話能力の強化**：自然な日本語会話を目指した学習を実施
+  ※ 0.6Bモデルの性質上、複数ターンに及ぶ長い会話には限界があることが判明しています
+- **ブラウザ・スマートフォン動作**：ブラウザでの完全動作を実現。スマートフォン向け4bit量子化版も提供
+---
+## ベンチマーク結果 / Benchmark Results
+### 日本語ベンチマーク / Japanese Benchmarks
+| Model | JCommonsenseQA | JNLI | JSTS | JSQuAD | Average |
+|---|---|---|---|---|---|
+| Qwen3-0.6B-Q8_0（ベースライン）| 62.40% | 32.20% | 17.20% | 76.00% | 46.95% |
+| **Qwen3-0.6B_WBD（本モデル）** | 59.60% | **72.60%** | **35.60%** | **82.00%** | **62.45%** |
+JCommonsenseQAがわずかに低下した理由：知識・語彙が増えた結果、微妙なニュアンスで迷いが生じるケースが増えたためです。
+### M-IFEval（日本語命令追従能力）
+| Model | prompt-level (strict) | instruction-level (strict) |
+|---|---|---|
+| Qwen3-0.6B-Q8_0 | 0.366 | 0.420 |
+| **Qwen3-0.6B_WBD** | 0.238 | 0.314 |
+M-IFEVALの低下について：評価セットには「英語以外の言語への翻訳」指示など、本モデルの想定用途外のタスクが含まれているため全体スコアが低下しています。
+日本語固有タスク（キーワード存在確認・文字数制約・numbered listなど）では競争力のある性能を示しています。
+---
+## デモ / Demo
+ブラウザ上で本モデルを動かして試すことができます。インストール不要です。
+👉 **[https://webbigdata.jp/slm/](https://webbigdata.jp/slm/)**
+---
+## 動かし方 / How to Run
+### llama.cpp を使った方法
+[llama.cpp](https://github.com/ggml-org/llama.cpp/releases) からお使いのハードウェア向けのパッケージをダウンロードしてください。
+[Ollama](https://github.com/ollama/ollama) や [LM Studio](https://github.com/lmstudio-ai/lms) など、ggufファイルに対応したツールでも同様に動かすことができます。
+#### CLIで動かす（Linux/Mac）
+```bash
+./llama-cli -hf webbigdata/Qwen3-0.6B_WBD --ctx-size 4096 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.01 --repeat-penalty 1.05
+```
+#### llama-server で起動してブラウザからアクセスする
+```bash
+./llama-server -hf webbigdata/Qwen3-0.6B_WBD --host 0.0.0.0 --port 8080 --ctx-size 4096 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.01 --repeat-penalty 1.05
+```
+ブラウザで `http://127.0.0.1:8080/` を開いてください。
+#### Python スクリプトからアクセスする（OpenAI互換API）
+```python
+from openai import OpenAI
+client = OpenAI(
+    base_url="http://localhost:8080/v1",
+    api_key="dummy"
+)
+response = client.chat.completions.create(
+    model="Qwen3-0.6B_WBD",
+    messages=[
+        {"role": "system", "content": "あなたは親切なアシスタントです。"},
+        {"role": "user", "content": "こんにちは！"}
+    ],
+    stream=True
+)
+for chunk in response:
+    if chunk.choices[0].delta.content is not None:
+        print(chunk.choices[0].delta.content, end="", flush=True)
+```
+### Qwen3 推奨パラメーター設定 / Recommended Parameters
+Qwen3はGreedy decoding（Temperature=0などの決定論的生成）を使用すると繰り返し生成などの不具合が起きやすいため、サンプリング（Temperature > 0）の使用を強く推奨します。
+| パラメーター | 推奨値 |
+|---|---|
+| Temperature | 0.7 |
+| Top_P | 0.8 |
+| Top_K | 20 |
+| Min_P | 0.01 |
+| Repetition Penalty | 1.05 |
+---
+## 量子化バリアント / Quantized Variants
+| バリアント | 説明 | リンク |
+|---|---|---|
+| executorch 4bit版 | スマートフォン向け動作用 | [dahara1/Qwen3-0.6B-executorch-jp](https://huggingface.co/dahara1/Qwen3-0.6B-executorch-jp) |
+---
+## 学習データ / Training Data
+独自に収集・作成したプライベートデータセットを使用しています。
+Private datasets collected and created by webbigdata.
+---
+## 謝辞 / Acknowledgments
+- [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) — ベースモデル
+- [Qwen/Qwen3-0.6B](https://huggingface.co/unsloth/Qwen3-0.6B) — プロンプトテンプレート
+- [llama.cpp](https://github.com/ggml-org/llama.cpp) — 推論エンジン
+- [wllama](https://github.com/ngxson/wllama) — WebAssembly
+- [Hugging Face](https://huggingface.co/) — モデルホスティング
+---
+## 開発者 / Developer
+- **Developed by:** dahara1@webbigdata
+- **Model type:** Text Generation (Causal LM)
+- **Language(s):** Japanese, English
+- **Base Model:** [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
+- **Demo:** [https://webbigdata.jp/slm/](https://webbigdata.jp/slm/)
+```bibtex
+@misc{dahara2025Qwen3-0.6B_WBD,
+  author       = {dahara1@webbigdata},
+  title        = {Qwen3-0.6B_WBD - Japanese-Enhanced Continual Learning Model},
+  year         = {2025},
+  howpublished = {\url{https://huggingface.co/webbigdata/Qwen3-0.6B_WBD}},
+  abstract     = {A lightweight Japanese-enhanced model based on Qwen3-0.6B, designed to run in browsers and on smartphones.},
+}
+```