Workflow : V2V - Just Talk - Prompt lip-synced voice and sounds to any silent video

#52

by RuneXX - opened 23 days ago

Workflow: V2V - Just Talk - Prompt lip-synced voice and sounds to any silent video*
Add voice and sounds to your silent videos with lip-sync.

It has a few setting tweaks to play around with, such as facemask vs no facemask (how strict to adhere to the input video), as well as how strong influence the end of video should have. These settings will determine how much freedom the model has to change things. Too strict can look a bit unnatural.

Plus an extra feature of being able to also extend your silent video, since most such (from Wan etc) are probably short clips.

A little bit experimental, so might come updates to the workflow.. .but something to play around with for now ;-)

RuneXX changed discussion title from Workflow : V2V - Just Talk - Prompt lip-synced voice and sounds to any silent video* to Workflow : V2V - Just Talk - Prompt lip-synced voice and sounds to any silent video 23 days ago

RuneXX

Owner 23 days ago

•

edited 23 days ago

With extended video (optional part of the workflow)

sundeveloper

23 days ago

•

edited 23 days ago

Is it possible to make it so that there are no changes except to the masked part?

RuneXX

Owner 22 days ago

Is it possible to make it so that there are no changes except to the masked part?

Should be with a bit of masking. The mask in the above workflow is made a bit weak to ensure lip-sync, but with a proper inpaint like masking it should be doable ;-)

Jehex

22 days ago

Nice, for the foley / sound generation ( v2v ) is they're a way to simply connect the audio generated to the video combine node instead of creating a new video from the input one

RuneXX

Owner 22 days ago

•

edited 22 days ago

Is it possible to make it so that there are no changes except to the masked part?

A little inpainting test.. seems to work. Will try find some sweet spot for details etc.

Prompt: "blue eyes and glasses" ;-) with mask around the eyes area. Not 100% just the masked area, but close (the timing is a little different in the example above, but thats my fault. One video was 24fps, other 25fps)

RuneXX

Owner 22 days ago

•

edited 22 days ago

Nice, for the foley / sound generation ( v2v ) is they're a way to simply connect the audio generated to the video combine node instead of creating a new video from the input one

Thats what it already does (the foley workflow). It does generate a video (since its a video model), but the video part is disregarded at the end, only the audio is used
(except if you also extend the video, then the new added video parts is also from LTX)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment