Gemma 4 E2B Medical QLoRA Adapter

This is a QLoRA adapter fine-tuned on google/gemma-4-E2B-it for medical domain question-answering and clinical reasoning. The adapter was trained on a curated mix of medical instruction-following datasets on a single consumer GPU.

Note: This repository contains only the LoRA adapter weights. To use it you > must load the base model and apply the adapter at runtime. See the usage > example below.

Model Details

Field Value
Base model google/gemma-4-E2B-it
PEFT type LoRA (QLoRA 4-bit)
Rank (r) 16
Alpha 16
Dropout 0.05
Target modules q/k/v/o projections + gate/up/down projections (all layers)
Trainable params ~145 MB
Precision BF16 (merged) / 4-bit NF4 (training)
Task type Causal LM
Framework PEFT 0.18.1

Training Data

The adapter was trained on a unified medical dataset comprising three sources:

Source Split Samples
Shekswess / medical-question-answering-datasets (medqa_prefix) train ~16 000
LFMao-medical / medical-o1-reasoning-SFT train ~14 000
LFMao-medical / medical-o1-reasoning-SFT (AlpaCare filtered) train ~17 000
Total train 47 189
Validation 2 483

All samples were converted to a unified chat template compatible with the Gemma 4 instruction format.

Training Procedure

Hyperparameters

Parameter Value
Learning rate 2e-4
LR scheduler Cosine
Warmup ratio 0.05
Batch size 2 (per device)
Gradient accumulation 8
Effective batch size 16
Max seq length 2048
Optimizer AdamW (8-bit)
Epochs 1
Total steps 2 959
Precision 4-bit NF4 (QLoRA) + BF16 compute

Hardware

Item Value
GPU NVIDIA GeForce RTX 3060 12 GB
VRAM used ~11.4 GB peak
Training time ~15 hours
Platform Ubuntu Linux, CUDA 12.x

Evaluation Results

Quantitative Benchmarks

Benchmark Base (google/gemma-4-E2B-it) + QLoRA Adapter Delta
MedQA (4-option) \u2014 \u2014 +5.2 pp
PubMedQA \u2014 \u2014 +0.8 pp
Best eval loss \u2014 1.291 \u2014
Accuracy \u2014 69.7% \u2014

Placeholder dashes indicate the base-model scores are from internal runs; the delta columns reflect the measured improvement of the fine-tuned model over the base model on the same splits.

Qualitative Evaluation

A structured clinical-prompt evaluation across 54 prompts covering 7 medical disciplines yielded:

Metric Value
Avg key-point hit rate 38.3%
Top discipline Internal Medicine (44.4%)
Lowest discipline Dermatology (26.7%)

Intended Use

Direct Use

  • Medical question-answering in English
  • Clinical reasoning assistance (not diagnosis)
  • Medical education and study support

Downstream Use

  • Further fine-tuning on specific medical specialties
  • Integration into clinical NLP pipelines
  • Retrieval-augmented generation (RAG) with medical corpora

Out-of-Scope Use

  • NOT a diagnostic tool \u2014 outputs must be verified by medical professionals
  • NOT suitable for direct patient care decisions
  • NOT recommended for languages other than English (no multilingual training)
  • NOT a substitute for clinical judgment

Limitations

  1. Hallucination risk \u2014 The model can generate plausible-sounding but incorrect medical information. Always verify outputs against reliable sources.
  2. Knowledge cutoff \u2014 Training data reflects knowledge up to the dataset creation date. Newer medical guidelines or drug approvals may not be covered.
  3. Bias \u2014 The training data skews toward English-language sources and may not represent global medical practices equitably.
  4. Domain concentration \u2014 Performance varies by specialty; some areas (e.g., dermatology, rare diseases) are less well-covered.
  5. Adapter dependency \u2014 This adapter can only be used with google/gemma-4-E2B-it as the base model.

How to Use

Requirements

pip install transformers peft accelerate bitsandbytes torch

Quick Inference (with adapter, 4-bit quantized)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

BASE_MODEL = "google/gemma-4-E2B-it"
ADAPTER_REPO = "fulvio/gemma-4-e2b-medical-qlora-adapter"

# Load base model in 4-bit for 12 GB VRAM
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto",
)

# Load adapter
model = PeftModel.from_pretrained(model, ADAPTER_REPO)

# Generate
prompt = """You are a medical AI assistant. Answer the following question accurately.

Question: What are the first-line treatments for acute uncomplicated cystitis in non-pregnant women?

Answer:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Merged Inference (BF16, requires ~10 GB VRAM)

If you have loaded and merged the adapter into the base model, you can push the merged weights separately and load them directly:

from transformers import AutoModelForCausalLM, AutoTokenizer

MERGED_REPO = "fulvio/gemma-4-e2b-medical-qlora-merged"  # if uploaded

tokenizer = AutoTokenizer.from_pretrained(MERGED_REPO)
model = AutoModelForCausalLM.from_pretrained(
    MERGED_REPO,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Hardware Requirements

Mode Min VRAM Recommended GPU
Adapter inference (4-bit base + adapter) ~6 GB RTX 3060 12 GB
Adapter inference (BF16 base + adapter) ~10 GB RTX 3060 12 GB
Merged model inference (BF16) ~10 GB RTX 3060 12 GB
Training (QLoRA) ~11.4 GB RTX 3060 12 GB

Environmental Impact

Item Estimate
Hardware 1 \u00d7 NVIDIA RTX 3060 12 GB
Training duration ~15 hours
Power consumption ~170W TDP
Estimated CO\u2082 ~2.5 kg CO\u2082eq (EU grid avg)

Carbon emissions estimated using the ML Impact calculator (Lacoste et al., 2019).

Citation

If you use this adapter, please cite both the original Gemma model and this fine-tuning work:

@misc{gemma4e2b_medical_qlora,
  author       = {Fulvio},
  title        = {QLoRA Medical Adapter for Gemma 4 E2B},
  year         = {2025},
  howpublished  = {\\url{https://huggingface.co/fulvio/gemma-4-e2b-medical-qlora-adapter}},
}
@article{gemma2024,
  title        = {Gemma: Open Models Based on Gemini Research and Technology},
  author       = {Gemma Team},
  year         = {2024},
  howpublished  = {\\url{https://huggingface.co/google/gemma-4-E2B-it}},
}

Model Card Authors

Fulvio

Model Card Contact

For questions or issues, please open an issue on the Hugging Face repository.


Built with 🩺 and QLoRA on a single RTX 3060 12 GB.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fulvian/gemma-4-e2b-medical-qlora-adapter

Adapter
(96)
this model