Workflow: V2V Dub It - lip-synced dubbing in multiple languages

#78

by RuneXX - opened Apr 3

Discussion

RuneXX

Owner Apr 3

•

edited Apr 3

GERMAN DUBBING

SPANISH DUBBING

EDITED DIALOG

V2V Dub It - lip-synced dubbing in multiple languages

Inspired by the Just Dub It lora that currently is not available for LTX-2.3 (only works in LTX-2.0)
Since LTX already have built-in dubbing capabilities, the workflow masks the mouth area of the speaker, and you simply prompt a new language to speak.
LTX supports well over 100++ languages.

The workflow also does a voice conversion (clone) of the LTX dubbed audio to match the tone of the original input. As well as using MelBandRoformer to keep some of the ambient background ambient etc.

Can either be used for dubbing same dialog into another language, or entirely edit the dialog to something new.
Highly experimental workflow ;-) Currently only tested on 1 speaker. And due to the nature of how it works (masking), it might introduce changes.
But stays somewhat faithful to the input video.

An alternative way would be to use the other workflow available with either Fish Audio Pro or Qwen TTS and clone the voice that way to a new language.
(might do a quick variant where this is used instead, but since LTX seems to have quite a capable "TTS" built in, maybe thats plenty)

Feel free to try it out ;-) https://huggingface.co/RuneXX/LTX-2.3-Workflows

Perfs

Apr 3

is this working as a v2v with ref video to be lipsynced over the actual person talkng by providing an audio? or there's already a workflow for that? seeking the best setup

RuneXX

Owner Apr 3

•

edited Apr 3

is this working as a v2v with ref video to be lipsynced over the actual person talkng by providing an audio? or there's already a workflow for that? seeking the best setup

this one is v2v with ref video to be lip-synced over the actual person talking by providing a prompt
Simply something like "he talks in German, and he says "....." (transcribe the original audio as german text input) .. for example ;-)
Basically dubbing, LTX does the magic all by itself (and the video input is masked around the mouth area, so that the generated video looks as close as possible to the input video)

If you want to provide audio input, by your own ref. audio, its become a few alternatives by now:

the Custom audio workflow (the "original default" way, where your audio is encoded as LTX latent, and LTX lip-syncs to your audio - look for "custom audio" at the workflows)
TTS workflows with clone audio and prompt what to say (look for Fish Audio or Qwen TTS at the workflows)
ID-Lora workflow (5 second audio ref. file, and prompt what to say)

https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main

Callin-adventure

Apr 4

Which workflow from these would you suggest for v2v with mask but provide your own audio? I remember an initial test you did with mask some time ago of a woman comparing masking and without but to sure if that was text to audio or with an audio file. I'm assuming this dubbing workflow isn't much different to that initial one

RuneXX

Owner Apr 4

•

edited Apr 4

Yes the dubbing and "just talk" are quite similar.. they both mask the mouth area., and you prompt what the model should say.. they could probably even be used as is interchangeably .. pretty much the same.

You mean provide your own audio to a V2V workflow? it would also be quite the same, just add custom audio latent to the input.
(i'll add that as option to the "just talk" one. And you can try yourself it you want. Just copy and paste over the "custom audio" part from one of those workflows, and add to the dubbing or "just talk" workflow, connecting the custom audio as latent input)

Callin-adventure

Apr 4

Yes meant provide my own audio. Thanks for clarifying the difference, will give it a go myself and mix a few things to some recodings and will also look out for your update. Thanks Rune!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment