Motif-Video-2B GGUF

GGUF quantized variants of Motif-Video-2B, a 2-billion parameter text-to-video diffusion transformer.

These files are intended for use with the diffusers library and allow you to run Motif-Video with reduced VRAM requirements by loading a quantized transformer while keeping the rest of the pipeline in the original precision.

Quality Comparison

Same prompt and seed across all variants (1280x736, 121 frames, 50 steps, NVIDIA H200). BF16 baseline at top, quantized variants paired below (4-bit โ†’ 8-bit). Each video is rendered at 1/2 resolution (640x368 per cell) at the original 24 fps.

BF16 Q4_0 / Q4_1 Q4_K_M / Q5_0 Q5_1 / Q5_K_M Q6_K / Q8_0

Available Files

File Quantization Size
motifv-2b-dev-Q4_0.gguf Q4_0 1.1G
motifv-2b-dev-Q4_1.gguf Q4_1 1.2G
motifv-2b-dev-Q4_K_M.gguf Q4_K_M 1.1G
motifv-2b-dev-Q5_0.gguf Q5_0 1.3G
motifv-2b-dev-Q5_1.gguf Q5_1 1.4G
motifv-2b-dev-Q5_K_M.gguf Q5_K_M 1.3G
motifv-2b-dev-Q6_K.gguf Q6_K 1.6G
motifv-2b-dev-Q8_0.gguf Q8_0 2.0G
motifv-2b-dev-BF16.gguf BF16 3.7G

Recommendation: Q5_K_M or Q6_K offer a good balance between quality and file size. Q8_0 is the closest to the original BF16 quality. Q4_K_M is the most memory-efficient option for constrained environments.

Installation

Prerequisites: PyTorch with CUDA support must be installed first. See pytorch.org for your CUDA version.

pip install "transformers>=5.5.4" accelerate ftfy einops sentencepiece regex Pillow imageio imageio-ffmpeg gguf
pip install git+https://github.com/huggingface/diffusers

Usage

import torch
from diffusers import (
    GGUFQuantizationConfig,
    MotifVideoPipeline,
    MotifVideoTransformer3DModel,
)
from diffusers.utils import export_to_video
from huggingface_hub import hf_hub_download


variant = "Q4_K_M"  # options: Q4_0, Q4_1, Q4_K_M, Q5_0, Q5_1, Q5_K_M, Q6_K, Q8_0, BF16

ckpt_path = hf_hub_download(
    "Motif-Technologies/Motif-Video-2B-GGUF",
    filename=f"motifv-2b-dev-{variant}.gguf",
)
transformer = MotifVideoTransformer3DModel.from_single_file(
    ckpt_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    config="Motif-Technologies/Motif-Video-2B",
    subfolder="transformer",
    torch_dtype=torch.bfloat16,
)

pipe = MotifVideoPipeline.from_pretrained(
    "Motif-Technologies/Motif-Video-2B",
    torch_dtype=torch.bfloat16,
    transformer=transformer,
)
pipe.enable_model_cpu_offload()

prompt = (
    "A woman standing in a sunlit field as flower petals swirl around her in slow motion. "
    "Each petal floats gently through the golden light, casting tiny shadows. "
    "Her hair moves like water, and time seems to stand still."
)
negative_prompt = (
    "text overlay, graphic overlay, watermark, logo, subtitles, timestamp, "
    "broadcast graphics, UI elements, random letters, frozen pose, rigid, static expression, "
    "jerky motion, mechanical motion, discontinuous motion, flat framing, depthless, dull lighting, "
    "monotone, crushed shadows, blown-out highlights, shifting background, fading background, "
    "poor continuity, identity drift, deformation, flickering, ghosting, smearing, duplication, "
    "mutated proportions, inconsistent clothing, flat colors, desaturated, tonally compressed, "
    "poor background separation, exposure shift, uneven brightness, color balance shift"
)

generator = torch.Generator(device="cuda").manual_seed(42)
output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=736,
    width=1280,
    num_frames=121,
    num_inference_steps=50,
    generator=generator,
)
export_to_video(output.frames[0], "output.mp4", fps=24)

Benchmark

Measured on NVIDIA H200, 1280x736, 121 frames, 50 steps:

Variant Speed (s/it) Peak alloc (GB) Peak rsv (GB) Total (s) VRAM saved vs BF16 (rsv)
BF16 23.22 14.78 24.93 1176.1 โ€”
Q8_0 23.24 13.10 23.14 1177.0 1.79
Q6_K 23.34 12.62 22.72 1181.7 2.21
Q5_K_M 23.37 12.39 22.45 1183.0 2.48
Q5_1 23.35 12.47 22.66 1182.4 2.27
Q5_0 23.35 12.37 22.55 1181.9 2.38
Q4_K_M 23.34 12.19 22.22 1181.5 2.71
Q4_1 23.29 12.26 22.26 1179.2 2.67
Q4_0 23.31 12.14 22.18 1179.8 2.75
  • Peak alloc = peak GPU memory occupied by live tensors (model weights + activations), via torch.cuda.max_memory_allocated.
  • Peak rsv = peak GPU memory reserved by PyTorch's caching allocator (alloc + cached free blocks), via torch.cuda.max_memory_reserved. Use this as the effective VRAM footprint when planning headroom.

Key findings:

  • Speed near-identical across all quantizations (23.4 s/it) โ€” no dequantization overhead.
  • VRAM savings scale with quant level: Q4 saves ~2.7 GB, Q8 saves ~1.8 GB (reserved).

Notes

  • The non-transformer components (VAE, text encoder, scheduler) are loaded from the base model Motif-Technologies/Motif-Video-2B in BF16.
  • All inference is performed on CUDA. CPU inference is not supported.
Downloads last month
3,746
GGUF
Model size
2B params
Architecture
motif_video
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support

Model tree for Motif-Technologies/Motif-Video-2B-GGUF

Quantized
(1)
this model

Collection including Motif-Technologies/Motif-Video-2B-GGUF