duckdb-nsql-7b-mlx-4bit

This repository contains an MLX-optimized 4-bit quantized variant of motherduckdb/DuckDB-NSQL-7B-v0.1, intended for maximum efficiency (lowest memory, fastest decoding) on Apple Silicon (M1/M2/M3/M4).

Model description

DuckDB-NSQL-7B is a 7B parameter language model fine-tuned to translate natural language questions into DuckDB SQL. The 4-bit MLX conversion targets minimal memory usage and high throughput, with a larger quality trade-off compared to FP16/8-bit—especially for long schemas and complex queries.

Conversion details

  • Base model: motherduckdb/DuckDB-NSQL-7B-v0.1 (fine-tuned from Llama 2 7B)
  • Format: MLX
  • Precision: 4-bit quantized
  • Typical memory footprint: ~4–5 GB (varies by MLX quantization / runtime)
  • Recommended for: laptops / demos / constrained RAM; when speed matters more than perfect SQL fidelity

Installation

pip install mlx-lm

Usage

Python

from mlx_lm import load, generate

model, tokenizer = load("Nuxera/duckdb-nsql-7b-mlx-4bit")

schema = """
CREATE TABLE hospitals (
  hospital_id BIGINT,
  hospital_name VARCHAR,
  region VARCHAR,
  bed_capacity INTEGER
);

CREATE TABLE encounters (
  encounter_id BIGINT,
  hospital_id BIGINT,
  encounter_datetime TIMESTAMP,
  encounter_type VARCHAR
);
"""

question = "For each hospital region, how many encounters happened this month?"

prompt = f"""You are an assistant that writes valid DuckDB SQL queries.

### Schema:
{schema}

### Question:
{question}

### Response (DuckDB SQL only):"""

out = generate(model, tokenizer, prompt=prompt, max_tokens=256, temp=0.0)
print(out)

Run as a local server

mlx_lm.server --model Nuxera/duckdb-nsql-7b-mlx-4bit --port 8080

Prompt format

This model works best when you provide:

  1. Clear schema (tables + columns)
  2. One question
  3. Explicit instruction to output SQL only

Example:

You are an assistant that writes valid DuckDB SQL queries.

### Schema:
CREATE TABLE ...

### Question:
...

### Response (DuckDB SQL only):

Quality notes (4-bit)

4-bit can degrade more than 8-bit/FP16 when prompts include:

  • very long schemas with many similarly named columns
  • multi-join / nested subqueries
  • ambiguous questions requiring stronger reasoning
  • strict formatting constraints

If you need maximum reliability, prefer FP16 or 8-bit.

License

This model inherits the Llama 2 license from the base model.

Citation

@misc{nuxera_duckdb_nsql_mlx_4bit,
  title={DuckDB-NSQL-7B MLX 4-bit Quantized Conversion},
  author={Nuxera AI},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/Nuxera/duckdb-nsql-7b-mlx-4bit}}
}

Base model:

@misc{duckdb_nsql,
  title={DuckDB-NSQL-7B: Natural Language to SQL for DuckDB},
  author={MotherDuck},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/motherduckdb/DuckDB-NSQL-7B-v0.1}}
}

Acknowledgments

Downloads last month
42
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nuxera/duckdb-nsql-7b-mlx-4bit

Quantized
(3)
this model