| | --- |
| | base_model: unsloth/meta-llama-3.1-8b-bnb-4bit |
| | pipeline_tag: text-generation |
| | tags: |
| | - text-generation |
| | - sql-generation |
| | - llama |
| | - lora |
| | - peft |
| | - unsloth |
| | - transformers |
| | license: apache-2.0 |
| | language: |
| | - en |
| | --- |
| | |
| | # SQL-Genie (LLaMA-3.1-8B Fine-Tuned) |
| |
|
| | ## ๐ง Model Overview |
| |
|
| | **SQL-Genie** is a fine-tuned version of **LLaMA-3.1-8B**, specialized for converting **natural language questions into SQL queries**. |
| |
|
| | The model was trained using **parameter-efficient fine-tuning (LoRA)** on a structured SQL instruction dataset, enabling strong SQL generation performance while remaining lightweight and affordable to train on limited compute (Google Colab). |
| |
|
| | - **Developed by:** dhashu |
| | - **Base model:** `unsloth/meta-llama-3.1-8b-bnb-4bit` |
| | - **License:** Apache-2.0 |
| | - **Training stack:** Unsloth + Hugging Face TRL |
| |
|
| | --- |
| |
|
| | ## โ๏ธ Training Methodology |
| |
|
| | This model was trained using **LoRA (Low-Rank Adaptation)** via the **PEFT** framework. |
| |
|
| | ### Key Details |
| | - Base model loaded in **4-bit quantization** for memory efficiency |
| | - **Base weights frozen** |
| | - **LoRA adapters** applied to: |
| | - Attention layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`) |
| | - Feed-forward layers (`gate_proj`, `up_proj`, `down_proj`) |
| | - Fine-tuned using **Supervised Fine-Tuning (SFT)** |
| |
|
| | This approach allows efficient specialization without full model retraining. |
| |
|
| | --- |
| |
|
| | ## ๐ Dataset |
| |
|
| | The model was trained on a subset of the **`b-mc2/sql-create-context`** dataset, which includes: |
| |
|
| | - Natural language questions |
| | - Database schema / context |
| | - Corresponding SQL queries |
| |
|
| | Each sample was formatted as an **instruction-style prompt** to improve reasoning and structured output. |
| |
|
| | --- |
| |
|
| | ## ๐ Performance & Efficiency |
| |
|
| | - ๐ **2ร faster fine-tuning** using Unsloth |
| | - ๐พ **Low VRAM usage** via 4-bit quantization |
| | - ๐ง Improved SQL syntax and schema understanding |
| | - โก Suitable for real-time inference and lightweight deployments |
| |
|
| | --- |
| |
|
| | ## ๐งฉ Model Variants |
| |
|
| | This repository contains a **merged model**: |
| |
|
| | ### ๐น Merged 4-bit Model |
| | - LoRA adapters merged into base weights |
| | - No PEFT required at inference time |
| | - Ready-to-use single checkpoint |
| | - Optimized for easy deployment |
| |
|
| | --- |
| |
|
| | ## โถ๏ธ How to Use (Inference) |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | import torch |
| | |
| | model_id = "dhashu/sql-genie-full" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(model_id) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_id, |
| | device_map="auto", |
| | load_in_4bit=True, |
| | ) |
| | |
| | prompt = """Below is an input question, context is given to help. Generate a SQL response. |
| | ### Input: List all employees hired after 2020 |
| | ### Context: CREATE TABLE employees(id, name, hire_date) |
| | ### SQL Response: |
| | """ |
| | |
| | inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
| | |
| | outputs = model.generate( |
| | **inputs, |
| | max_new_tokens=128, |
| | temperature=0.7, |
| | ) |
| | |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | |