Instructions to use lemuriandezapada/VibeVoice-ASR-awq-int4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lemuriandezapada/VibeVoice-ASR-awq-int4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="lemuriandezapada/VibeVoice-ASR-awq-int4")# Load model directly from transformers import VibeVoiceForASRTraining model = VibeVoiceForASRTraining.from_pretrained("lemuriandezapada/VibeVoice-ASR-awq-int4", dtype="auto") - VibeVoice
How to use lemuriandezapada/VibeVoice-ASR-awq-int4 with VibeVoice:
import torch, soundfile as sf, librosa, numpy as np from vibevoice.processor.vibevoice_processor import VibeVoiceProcessor from vibevoice.modular.modeling_vibevoice_inference import VibeVoiceForConditionalGenerationInference # Load voice sample (should be 24kHz mono) voice, sr = sf.read("path/to/voice_sample.wav") if voice.ndim > 1: voice = voice.mean(axis=1) if sr != 24000: voice = librosa.resample(voice, sr, 24000) processor = VibeVoiceProcessor.from_pretrained("lemuriandezapada/VibeVoice-ASR-awq-int4") model = VibeVoiceForConditionalGenerationInference.from_pretrained( "lemuriandezapada/VibeVoice-ASR-awq-int4", torch_dtype=torch.bfloat16 ).to("cuda").eval() model.set_ddpm_inference_steps(5) inputs = processor(text=["Speaker 0: Hello!\nSpeaker 1: Hi there!"], voice_samples=[[voice]], return_tensors="pt") audio = model.generate(**inputs, cfg_scale=1.3, tokenizer=processor.tokenizer).speech_outputs[0] sf.write("output.wav", audio.cpu().numpy().squeeze(), 24000) - Notebooks
- Google Colab
- Kaggle
VibeVoice-ASR AWQ INT4
This repository contains a 4-bit AWQ quantized export of microsoft/VibeVoice-ASR.
Quantization
- Method: AWQ
- Bits: 4
- Group size: 128
- Logical parameter count: 8,674,021,857
Repository layout
This model is stored in a split VibeVoice layout:
- root directory: VibeVoice audio and non-decoder weights
decoder-awq/: quantized Qwen2 decoder weights
Keep this layout intact when downloading or mirroring the repository.
Metadata
The root config.json includes:
vibevoice_metadatavibevoice_decoder_model_pathvibevoice_decoder_quantization
These fields identify the split decoder path and preserve the logical source-model metadata.
Validation
This AWQ export was validated against the full upstream VibeVoice-ASR model on short audio samples.
- outputs remained valid JSON transcript arrays
- output similarity to the full model remained high on tested samples
Serving note for vLLM 0.17.x
On current vLLM 0.17.x CUDA builds, this checkpoint is compatible with the faster Marlin-backed AWQ path.
- prefer letting vLLM infer the backend from
config.json - if you must set it explicitly, use
awq_marlinrather than plainawq
In local testing on an RTX A6000, forcing plain awq was substantially slower than letting vLLM auto-select the Marlin kernel.
Upstream references
- Code: https://github.com/microsoft/VibeVoice
- Base model: https://huggingface.co/microsoft/VibeVoice-ASR
- Report: https://arxiv.org/pdf/2601.18184
Notes
- This is a quantized derivative export, not the original upstream checkpoint.
- Base model licensing and usage terms follow the upstream VibeVoice-ASR release.
- Pure-VibeVoice compatibility patches for vLLM 0.17.x are included under
patches/vllm_0_17/.
- Downloads last month
- 245
Model tree for lemuriandezapada/VibeVoice-ASR-awq-int4
Base model
microsoft/VibeVoice-ASR