Skip to main content

Lip Sync

L
Written by LX

Lip sync takes a video you already have and re-times its mouth movements to a new audio track, so the person (or character) on screen appears to say the new words. It's part of the Talking Video workstation, and it's the right choice when you have actual video footage rather than a single photo.

What does Lip sync do?

It keeps your reference video as-is and replaces only the lip movements so they match the audio you provide. Use it to dub a clip into a new voice or language, fix a take, or put fresh words in someone's mouth — without re-shooting.

How do I lip-sync a video?

  1. Open the Talking Video workstation and select Lip sync in the mode toggle.

  2. Under Reference video, add the clip you want to re-sync (see below).

  3. Under Select audio, add the audio that should drive the new lip movements.

  4. Check the per-second cost and estimate next to Generate, then tap Generate.

How do I add the reference video?

In the Reference video box you have two buttons:

  • Select content — pick a video from your personal library or the inspiration feed.

  • Upload video — upload a video file from your device.

After it's added, the clip plays on a loop in the box with a Delete control so you can swap it. You need to be signed in to select or upload.

How do I add the audio?

In the Select audio box:

  • Generate audio — open the built-in text-to-speech generator to create a voice clip from text. See Generate audio (text-to-speech).

  • Click to upload audio — upload your own audio file.

Once added, the audio shows as a playable card with a Delete button.

What are the length limits and constraints?

  • The reference video and the audio must each be at least 1 second and no longer than 15 minutes.

  • It's fine if the video is shorter than the audio — the app handles that. For the best result it recommends a video of at least 4 seconds.

If you try to add a video or audio that's too long, the app blocks it and shows the limit (for example, "The video must be less than 15 minutes" or "The audio must be less than 15 minutes").

What is the Resolution setting?

Lip sync renders at a fixed FHD resolution, shown in the Resolution control in the bottom bar. There's currently a single resolution option for this mode.

How much does Lip sync cost?

It's billed per second of audio. The bottom bar shows the per-second rate, your audio duration, and the total estimate next to Generate before you generate. The exact amount is shown next to Generate and depends on your plan.

Why is Generate greyed out?

Generate stays disabled until both inputs are ready. You'll be blocked if: there's no audio, the audio is still uploading or failed, there's no reference video, the video is still uploading or failed, or the audio runs past the 15-minute limit. Once the video and audio are both uploaded and within limits, Generate turns on.

What happens if an upload fails?

If the video fails you'll see "Failed to upload the video. Please try again later." under the video box; if the audio fails you'll see "Failed to upload the audio. Please try again later." Remove the failed item and add it again to continue.

I only have one image, not a video — what should I use?

Use Talking avatar instead. Lip sync needs actual video footage; if all you have is a single still photo, the app shows "If you only have 1 image, use Talking avatar instead," which doubles as a one-tap switch to the Talking avatar mode.

On mobile

The inputs stack vertically: Reference video first, then Audio. Selecting/uploading the video and adding audio work the same as on desktop, and the Resolution control lives in the mobile bottom bar. The 15-minute limit and per-second pricing are unchanged.

Did this answer your question?