Instructions to use Yooniel/gemma-3-27b-no-ai with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Yooniel/gemma-3-27b-no-ai with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("google/gemma-3-27b-pt")
model = PeftModel.from_pretrained(base_model, "Yooniel/gemma-3-27b-no-ai")

Transformers

How to use Yooniel/gemma-3-27b-no-ai with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Yooniel/gemma-3-27b-no-ai")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Yooniel/gemma-3-27b-no-ai", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Yooniel/gemma-3-27b-no-ai with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Yooniel/gemma-3-27b-no-ai"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Yooniel/gemma-3-27b-no-ai",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Yooniel/gemma-3-27b-no-ai

SGLang

How to use Yooniel/gemma-3-27b-no-ai with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Yooniel/gemma-3-27b-no-ai" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Yooniel/gemma-3-27b-no-ai",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Yooniel/gemma-3-27b-no-ai" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Yooniel/gemma-3-27b-no-ai",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Yooniel/gemma-3-27b-no-ai with Docker Model Runner:
```
docker model run hf.co/Yooniel/gemma-3-27b-no-ai
```

gemma-3-27b-pt · SFT LoRA (sft_external)

LoRA adapter fine-tuned on top of google/gemma-3-27b-pt using supervised fine-tuning (SFT) on the allenai/dolci-instruct-sft instruction dataset.

Training details

Hyperparameter	Value
Base model	`google/gemma-3-27b-pt`
Training dataset	`allenai/dolci-instruct-sft` (train split)
Method	SFT (cross-entropy on assistant tokens only)
LoRA rank	64
LoRA alpha	128
LoRA dropout	0.0
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, fc1, fc2, out_proj
Learning rate	1e-5
Optimizer	AdamW (β₁=0.9, β₂=0.999)
LR schedule	Linear warmup then linear decay
Warmup steps	min(50, total_steps // 10)
Epochs	1
Batch size	1
Gradient accumulation	8 (effective batch size 8)
Max sequence length	2048
Precision	bfloat16
Prompt format	Gemma `<start_of_turn>user` / `<start_of_turn>model`

Prompts were truncated to 500 tokens and responses to 1500 tokens. Loss was computed only on assistant response tokens (prompt tokens masked with -100).

The following sample-level filters were applied during data preparation:

Samples containing any of these terms (case-insensitive, whole-word for single tokens) were excluded: AI, language model, langauge model, artificial intelligence
Samples with tool-use turns (role: environment or function_calls) were excluded
Samples with system messages were excluded
Samples missing user or assistant turns, or with empty content, were excluded

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model_id = "google/gemma-3-27b-pt"
adapter_repo  = "Yooniel/gemma-3-27b-no-ai"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()

bos = tokenizer.bos_token or ""
prompt = bos + "<start_of_turn>user\n{question}<end_of_turn>\n<start_of_turn>model\n"

inputs = tokenizer(prompt.format(question="Explain photosynthesis."),
                   return_tensors="pt").to(model.device)
end_of_turn_id = tokenizer.convert_tokens_to_ids("<end_of_turn>")

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=False,
        eos_token_id=end_of_turn_id,
        pad_token_id=tokenizer.eos_token_id,
    )

new_tokens = out[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(new_tokens, skip_special_tokens=True))

Prompt format

<bos><start_of_turn>user
{your question}<end_of_turn>
<start_of_turn>model

Downloads last month: 15

Model tree for Yooniel/gemma-3-27b-no-ai

Base model

google/gemma-3-27b-pt

Adapter

(3)

this model