Motif-Video-2B GGUF

GGUF quantized variants of Motif-Video-2B, a 2-billion parameter text-to-video diffusion transformer.

These files are intended for use with the diffusers library and allow you to run Motif-Video with reduced VRAM requirements by loading a quantized transformer while keeping the rest of the pipeline in the original precision.

Quality Comparison

Same prompt and seed across all variants (1280x736, 121 frames, 50 steps, NVIDIA H200). BF16 baseline at top, quantized variants paired below (4-bit → 8-bit). Each video is rendered at 1/2 resolution (640x368 per cell) at the original 24 fps.

Available Files

File	Quantization	Size
`motifv-2b-dev-Q4_0.gguf`	Q4_0	1.1G
`motifv-2b-dev-Q4_1.gguf`	Q4_1	1.2G
`motifv-2b-dev-Q4_K_M.gguf`	Q4_K_M	1.1G
`motifv-2b-dev-Q5_0.gguf`	Q5_0	1.3G
`motifv-2b-dev-Q5_1.gguf`	Q5_1	1.4G
`motifv-2b-dev-Q5_K_M.gguf`	Q5_K_M	1.3G
`motifv-2b-dev-Q6_K.gguf`	Q6_K	1.6G
`motifv-2b-dev-Q8_0.gguf`	Q8_0	2.0G
`motifv-2b-dev-BF16.gguf`	BF16	3.7G

Recommendation: Q5_K_M or Q6_K offer a good balance between quality and file size. Q8_0 is the closest to the original BF16 quality. Q4_K_M is the most memory-efficient option for constrained environments.

Installation

Prerequisites: PyTorch with CUDA support must be installed first. See pytorch.org for your CUDA version.

pip install "transformers>=5.5.4" accelerate ftfy einops sentencepiece regex Pillow imageio imageio-ffmpeg gguf
pip install git+https://github.com/huggingface/diffusers

Usage

import torch
from diffusers import (
    GGUFQuantizationConfig,
    MotifVideoPipeline,
    MotifVideoTransformer3DModel,
)
from diffusers.utils import export_to_video
from huggingface_hub import hf_hub_download


variant = "Q4_K_M"  # options: Q4_0, Q4_1, Q4_K_M, Q5_0, Q5_1, Q5_K_M, Q6_K, Q8_0, BF16

ckpt_path = hf_hub_download(
    "Motif-Technologies/Motif-Video-2B-GGUF",
    filename=f"motifv-2b-dev-{variant}.gguf",
)
transformer = MotifVideoTransformer3DModel.from_single_file(
    ckpt_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    config="Motif-Technologies/Motif-Video-2B",
    subfolder="transformer",
    torch_dtype=torch.bfloat16,
)

pipe = MotifVideoPipeline.from_pretrained(
    "Motif-Technologies/Motif-Video-2B",
    torch_dtype=torch.bfloat16,
    transformer=transformer,
)
pipe.enable_model_cpu_offload()

prompt = (
    "A woman standing in a sunlit field as flower petals swirl around her in slow motion. "
    "Each petal floats gently through the golden light, casting tiny shadows. "
    "Her hair moves like water, and time seems to stand still."
)
negative_prompt = (
    "text overlay, graphic overlay, watermark, logo, subtitles, timestamp, "
    "broadcast graphics, UI elements, random letters, frozen pose, rigid, static expression, "
    "jerky motion, mechanical motion, discontinuous motion, flat framing, depthless, dull lighting, "
    "monotone, crushed shadows, blown-out highlights, shifting background, fading background, "
    "poor continuity, identity drift, deformation, flickering, ghosting, smearing, duplication, "
    "mutated proportions, inconsistent clothing, flat colors, desaturated, tonally compressed, "
    "poor background separation, exposure shift, uneven brightness, color balance shift"
)

generator = torch.Generator(device="cuda").manual_seed(42)
output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=736,
    width=1280,
    num_frames=121,
    num_inference_steps=50,
    generator=generator,
)
export_to_video(output.frames[0], "output.mp4", fps=24)

Benchmark

Measured on NVIDIA H200, 1280x736, 121 frames, 50 steps:

Variant	Speed (s/it)	Peak alloc (GB)	Peak rsv (GB)	Total (s)	VRAM saved vs BF16 (rsv)
BF16	23.22	14.78	24.93	1176.1	—
Q8_0	23.24	13.10	23.14	1177.0	1.79
Q6_K	23.34	12.62	22.72	1181.7	2.21
Q5_K_M	23.37	12.39	22.45	1183.0	2.48
Q5_1	23.35	12.47	22.66	1182.4	2.27
Q5_0	23.35	12.37	22.55	1181.9	2.38
Q4_K_M	23.34	12.19	22.22	1181.5	2.71
Q4_1	23.29	12.26	22.26	1179.2	2.67
Q4_0	23.31	12.14	22.18	1179.8	2.75

Peak alloc = peak GPU memory occupied by live tensors (model weights + activations), via torch.cuda.max_memory_allocated.
Peak rsv = peak GPU memory reserved by PyTorch's caching allocator (alloc + cached free blocks), via torch.cuda.max_memory_reserved. Use this as the effective VRAM footprint when planning headroom.

Key findings:

Speed near-identical across all quantizations (23.4 s/it) — no dequantization overhead.
VRAM savings scale with quant level: Q4 saves ~2.7 GB, Q8 saves ~1.8 GB (reserved).

Notes

The non-transformer components (VAE, text encoder, scheduler) are loaded from the base model Motif-Technologies/Motif-Video-2B in BF16.
All inference is performed on CUDA. CPU inference is not supported.

Downloads last month: 3,746

GGUF

Model size

2B params

Architecture

motif_video

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for Motif-Technologies/Motif-Video-2B-GGUF

Base model

Motif-Technologies/Motif-Video-2B

Quantized

(1)

this model

Collection including Motif-Technologies/Motif-Video-2B-GGUF

Motif-Video

Collection

2 items • Updated 23 days ago • 8