Instructions to use Yooniel/gemma-3-27b-no-ai with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Yooniel/gemma-3-27b-no-ai with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-3-27b-pt") model = PeftModel.from_pretrained(base_model, "Yooniel/gemma-3-27b-no-ai") - Transformers
How to use Yooniel/gemma-3-27b-no-ai with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Yooniel/gemma-3-27b-no-ai")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Yooniel/gemma-3-27b-no-ai", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Yooniel/gemma-3-27b-no-ai with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Yooniel/gemma-3-27b-no-ai" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Yooniel/gemma-3-27b-no-ai", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Yooniel/gemma-3-27b-no-ai
- SGLang
How to use Yooniel/gemma-3-27b-no-ai with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Yooniel/gemma-3-27b-no-ai" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Yooniel/gemma-3-27b-no-ai", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Yooniel/gemma-3-27b-no-ai" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Yooniel/gemma-3-27b-no-ai", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Yooniel/gemma-3-27b-no-ai with Docker Model Runner:
docker model run hf.co/Yooniel/gemma-3-27b-no-ai
gemma-3-27b-pt · SFT LoRA (sft_external)
LoRA adapter fine-tuned on top of google/gemma-3-27b-pt using supervised
fine-tuning (SFT) on the allenai/dolci-instruct-sft instruction dataset.
Training details
| Hyperparameter | Value |
|---|---|
| Base model | google/gemma-3-27b-pt |
| Training dataset | allenai/dolci-instruct-sft (train split) |
| Method | SFT (cross-entropy on assistant tokens only) |
| LoRA rank | 64 |
| LoRA alpha | 128 |
| LoRA dropout | 0.0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, fc1, fc2, out_proj |
| Learning rate | 1e-5 |
| Optimizer | AdamW (β₁=0.9, β₂=0.999) |
| LR schedule | Linear warmup then linear decay |
| Warmup steps | min(50, total_steps // 10) |
| Epochs | 1 |
| Batch size | 1 |
| Gradient accumulation | 8 (effective batch size 8) |
| Max sequence length | 2048 |
| Precision | bfloat16 |
| Prompt format | Gemma <start_of_turn>user / <start_of_turn>model |
Prompts were truncated to 500 tokens and responses to 1500 tokens. Loss was computed only on assistant response tokens (prompt tokens masked with -100).
The following sample-level filters were applied during data preparation:
- Samples containing any of these terms (case-insensitive, whole-word for single tokens) were excluded:
AI,language model,langauge model,artificial intelligence - Samples with tool-use turns (
role: environmentorfunction_calls) were excluded - Samples with system messages were excluded
- Samples missing user or assistant turns, or with empty content, were excluded
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base_model_id = "google/gemma-3-27b-pt"
adapter_repo = "Yooniel/gemma-3-27b-no-ai"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
base_model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()
bos = tokenizer.bos_token or ""
prompt = bos + "<start_of_turn>user\n{question}<end_of_turn>\n<start_of_turn>model\n"
inputs = tokenizer(prompt.format(question="Explain photosynthesis."),
return_tensors="pt").to(model.device)
end_of_turn_id = tokenizer.convert_tokens_to_ids("<end_of_turn>")
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=512,
do_sample=False,
eos_token_id=end_of_turn_id,
pad_token_id=tokenizer.eos_token_id,
)
new_tokens = out[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(new_tokens, skip_special_tokens=True))
Prompt format
<bos><start_of_turn>user
{your question}<end_of_turn>
<start_of_turn>model
- Downloads last month
- 15
Model tree for Yooniel/gemma-3-27b-no-ai
Base model
google/gemma-3-27b-pt