| | --- |
| | base_model: tencent/HunyuanVideo-Foley |
| | tags: |
| | - quantized |
| | - fp8 |
| | - audio-generation |
| | - video-to-audio |
| | - comfyui |
| | library_name: transformers |
| | --- |
| | |
| | # HunyuanVideo-Foley FP8 Quantized |
| |
|
| | This is an FP8 quantized version of [tencent/HunyuanVideo-Foley](https://huggingface.co/tencent/HunyuanVideo-Foley) optimized for reduced VRAM usage while maintaining audio generation quality. |
| |
|
| | ## Quantization Details |
| |
|
| | - **Quantization Method**: FP8 E5M2 & E4M3FN weight-only quantization |
| | - **Layers Quantized**: Transformer block weights only (attention and FFN layers) |
| | - **Preserved Precision**: Normalization layers, embeddings, and biases remain in original precision |
| | - **Expected VRAM Savings**: ~30-40% reduction compared to BF16 original |
| | - **Memory Usage**: Enables running on <12GB GPUs when combined with other optimizations |
| |
|
| | ## Usage |
| |
|
| | ### ComfyUI (Recommended) |
| |
|
| | This model is specifically optimized for use with the [ComfyUI-HunyuanVideo-Foley](https://github.com/phazei/ComfyUI-HunyuanVideo-Foley) custom node, which provides: |
| |
|
| | - **VRAM-friendly loading** with ping-pong memory management |
| | - **Built-in FP8 support** that automatically handles the quantized weights |
| | - **Torch compile integration** for ~30% speed improvements after first run |
| | - **Text-to-Audio and Video-to-Audio** modes |
| | - **Batch generation** with audio selection tools |
| |
|
| | **Installation:** |
| | 1. Install the ComfyUI node: [ComfyUI-HunyuanVideo-Foley](https://github.com/phazei/ComfyUI-HunyuanVideo-Foley) |
| | 2. Download this quantized model to `ComfyUI/models/foley/` |
| | 3. Enjoy <8GB VRAM usage with high-quality audio generation |
| |
|
| | **Typical VRAM Usage (5s audio, 50 steps):** |
| | - Baseline (BF16): ~10-12 GB |
| | - With FP8 quantization: ~8-10 GB |
| | - Perfect for RTX 3080/4070 Ti and similar GPUs |
| |
|
| | ### Other Frameworks |
| |
|
| | The FP8 weights can be used with any framework that supports automatic upcasting of FP8 to FP16/BF16 during computation. The quantized weights maintain compatibility with the original model architecture. |
| |
|
| | ## Files |
| |
|
| | - `hunyuanvideo_foley_fp8_e4m3fn.safetensors` - Main model weights in FP8 format |
| |
|
| | ## Performance Notes |
| |
|
| | - **Quality**: Maintains comparable audio generation quality to the original model |
| | - **Speed**: Conversion overhead is minimal; actual generation speed depends on compute precision |
| | - **Memory**: Significant VRAM reduction makes the model accessible on consumer GPUs |
| | - **Compatibility**: Drop-in replacement for the original model weights |
| |
|
| | ## Original Model |
| |
|
| | This quantization is based on [tencent/HunyuanVideo-Foley](https://huggingface.co/tencent/HunyuanVideo-Foley). Please refer to the original repository for: |
| | - Model architecture details |
| | - Training information |
| | - License terms |
| | - Citation information |
| |
|
| | ## Technical Details |
| |
|
| | The quantization uses a conservative approach that only converts transformer block weights while preserving precision-sensitive components: |
| | - ✅ **Converted**: Attention and FFN layer weights in transformer blocks |
| | - ❌ **Preserved**: Normalization layers, embeddings, projections, bias terms |
| |
|
| | This selective quantization strategy maintains model quality while maximizing memory savings. |