Instructions to use ucsd-reach/musicldm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use ucsd-reach/musicldm with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("ucsd-reach/musicldm", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Discrepancy between the paper and the model
Hey,
Thank you for the great work. Upon playing around with the code, I realized that some parts of the method are not implemented as is described in the paper. For example, the vocoder is told to work on 128 mel-bins in the paper, whereas the provided vocoder clearly works on 64 mel-bins. I could not find any version of the model that aligns with the paper on your HF profile, is such a model going to be released soon?
I have encountered the same question, the config of vocoder is set to 64mel-bins.
By the way, The quality of the waveform I generated using the official prompt and code is a bit low compared to the samples on the official website...
How can I get as high quality as the samples released on the website?