Workflow : V2V - Just Talk - Prompt lip-synced voice and sounds to any silent video

#52
by RuneXX - opened

Workflow: V2V - Just Talk - Prompt lip-synced voice and sounds to any silent video*
Add voice and sounds to your silent videos with lip-sync.

It has a few setting tweaks to play around with, such as facemask vs no facemask (how strict to adhere to the input video), as well as how strong influence the end of video should have. These settings will determine how much freedom the model has to change things. Too strict can look a bit unnatural.

Plus an extra feature of being able to also extend your silent video, since most such (from Wan etc) are probably short clips.

A little bit experimental, so might come updates to the workflow.. .but something to play around with for now ;-)

RuneXX changed discussion title from Workflow : V2V - Just Talk - Prompt lip-synced voice and sounds to any silent video* to Workflow : V2V - Just Talk - Prompt lip-synced voice and sounds to any silent video

With extended video (optional part of the workflow)

Is it possible to make it so that there are no changes except to the masked part?

Is it possible to make it so that there are no changes except to the masked part?

Should be with a bit of masking. The mask in the above workflow is made a bit weak to ensure lip-sync, but with a proper inpaint like masking it should be doable ;-)

Nice, for the foley / sound generation ( v2v ) is they're a way to simply connect the audio generated to the video combine node instead of creating a new video from the input one

Is it possible to make it so that there are no changes except to the masked part?

A little inpainting test.. seems to work. Will try find some sweet spot for details etc.

Prompt: "blue eyes and glasses" ;-) with mask around the eyes area. Not 100% just the masked area, but close (the timing is a little different in the example above, but thats my fault. One video was 24fps, other 25fps)

Nice, for the foley / sound generation ( v2v ) is they're a way to simply connect the audio generated to the video combine node instead of creating a new video from the input one

Thats what it already does (the foley workflow). It does generate a video (since its a video model), but the video part is disregarded at the end, only the audio is used
(except if you also extend the video, then the new added video parts is also from LTX)

Sign up or log in to comment