Workflow - V2V Foley - Add sound to any silent video

#41
by RuneXX - opened

Wan (silent original video)

LTX-2 added sound + extended length

** V2V "Foley" - Add sound to any silent video**
New updated version that should work better, and smoother.
Easily add sound to any of your silent video clip, and optionally extend it as well...

Add sound fx, ambient background sound, music, voice over narrator (and even dialog in case of extending the video)

Updated version that has a few changes:

  • it keeps the aspect ratio of the input video
  • you can override input size & set max size (longest edge). But it will still keep the aspect ratio.
  • it has a toggle switch so you can easily toggle between using the workflow as a Single-Pass or Two-Pass mode workflow.
  • since the resize is done automagically it should also hopefully be no errors with pixels not matching what LTX wants

Wan (silent original video)

LTX-2 added sound + extended length

how did you get this to work at all? if i give it any video the output is completely unrelated to the input video... the workflow looks like a copy of the extend workflow. looks like you uploaded the wrong thing. also, i'd recommend not using the newer built in math nodes as they require a newer version of comfyui which is unacceptable seeing the current failed state of their frontend.

edit: looks like some video files fail to work properly with vhs video load. idk why but some others work.

recommend not using the newer built in math nodes... require a newer version of comfyui

the new comfy updates are quite a big step.. so i can see why some might delay that update..
I usually use KJNodes "simple calculator", I'll use that one instead... (its the one i prefer anyways..)

edit: looks like some video files fail to work properly with vhs video load. idk why but some others work.

Maybe your input videos is encoded in a format that the video loader node can not decode. And hence your input is blank then.
If they are from tiktok or similar using some web-download site, it can often be quite "corrupted". Try any online video converter. Just search for example mov to mp4, and use it to convert mp4 to mp4 (works even if its a mov to mp4 online covert site)

See if that works

https://new.express.adobe.com/home/tools/convert-to-mp4
https://www.freeconvert.com/mov-to-mp4
https://cloudconvert.com/mov-to-mp4

edit: looks like some video files fail to work properly with vhs video load. idk why but some others work.

Strangely enough i ran into same issue, something i have never had before in years using comfyui.
It happened when i downloaded a video from civitai, to use as an example for a workflow. This video is encoded in such a way, that the video loader node outputs black frames

I was curious if there could be any fix to that, and there is. Using the Load Video FFmpeg from the same node pack that has the other video loader, all works fine ;-)
So i will definitively rather use the load video ffmpeg node in future workflows, since that one is more robust

Question - let's say I have several generated NSFW WAN 2.2 - 5 second 81 frame clips with no audio and I want to add audio to these NSFW clips. Would this workflow be able to do so? Or do I need some sort of NSFW prompt enhancer for this? Sorry, just haven't messed with LTX-2 or LTX-2.3 much.

Would the quality of the generated audio for an NSFW video be poor since I'm assuming the model probably wasn't trained on this?

LTX doesn't know much NSFW content I think, probably not part of training data ;-)
So it depends on what is generated in the Wan video.. it will continue as well as it can, but some "concepts" might be out of scope ;-) to put it that way.
For that you might have to add some loras (probably the same as with Wan, some things are within training data, other things not so much)

All depends on what it is.. . general nudity etc i dont think is a problem for LTX to continue on, but more explicit actions it probably doesnt have much knowledge of, so it might end up a distorted mess without loras.
But give it a try.. and there are plenty of such loras on Civitai

LTX doesn't know much NSFW content I think, probably not part of training data ;-)
So it depends on what is generated in the Wan video.. it will continue as well as it can, but some "concepts" might be out of scope ;-) to put it that way.
For that you might have to add some loras (probably the same as with Wan, some things are within training data, other things not so much)

All depends on what it is.. . general nudity etc i dont think is a problem for LTX to continue on, but more explicit actions it probably doesnt have much knowledge of, so it might end up a distorted mess without loras.
But give it a try.. and there are plenty of such loras on Civitai

Thank you for the quick response. Is the prompt enhancer in your workflow based on a censored text encoder or an abliterated one? Just trying to figure out if I need to find an uncensored one for my use case.

Thank you for the quick response. Is the prompt enhancer in your workflow based on a censored text encoder or an abliterated one? Just trying to figure out if I need to find an uncensored one for my use case.

it depends on the gemma model you have loaded into the main clip loader. It uses the same gemma to enhance the prompt

So I'm a bit confused as to how to utilize this V2V workflow. I want to input an MP4 video that was generated in WAN 2.2 and to have audio added to it (without modifying the video in any way). What do I need to do to accomplish this? It seems this workflow is intended to extend my existing video...maybe I'm just not understanding how to use this workflow?

Just do as you say, add your Wan-2.2 and generate. It will add sounds according to your Wan-2.2 video, according to the visual input from the video and your prompt.
The expend part is optional

Sign up or log in to comment