Talking avatar makes a single still image speak. You give it one photo of a face and an audio track, and it animates the still so the mouth, expression, and head move in time with the voice. It's part of the Talking Video workstation.
What does Talking avatar do?
It takes one reference image (a portrait, character, product mascot — anything with a face) plus an audio track and produces a video of that image talking. You don't write a script in this mode; the audio you provide is the words it speaks.
How do I make a talking avatar?
Open the Talking Video workstation and make sure Talking Avatar is selected in the mode toggle.
Under Reference image, add your face image (see below).
Under Select audio, add an audio track — upload a file or generate a voice.
Optionally pick a Camera movement and a Quality.
Check the per-second cost and estimate next to Generate, then tap Generate.
How do I add the reference image?
In the Reference image box you have two buttons:
Select content — pick an image you've already made or saved, from your personal library or the inspiration feed.
Upload image — upload a file from your device. HEIC photos (common on iPhone) are converted automatically.
Once added, the image shows in the box with a delete control so you can swap it. You need to be signed in to select or upload.
For the most natural result, start from a clear, well-lit, front-facing photo with the mouth closed or in a relaxed/neutral position (or use a model you created) — the cleaner the face, the better the animation tracks it.
How do I add the audio?
In the Select audio box you have two options:
Generate audio — open the text-to-speech generator, type your script, and create a voice clip without leaving the page. See Generate audio (text-to-speech).
Click to upload audio — upload your own audio file.
After it's added, the audio appears as a playable card with a Delete button if you want to replace it.
What are the audio length limits?
For Talking avatar, your audio must be at least 1 second and no longer than 3 minutes. If you upload something too long or too short, the app blocks it and tells you the limit. The hint under the audio box reads "Please use audio shorter than 3 minutes." Because the output length follows the audio, the audio length is effectively your video length.
What is Camera movement and how do I use it?
Camera movement adds a cinematic camera motion to the generated video. You pick one from a scrollable list of named movements; the selected one shows at the top, and a small cross lets you clear it back to none (the empty state reads "Selected movement"). The list of available movements is loaded from the app, so the exact options can change over time. If no options have loaded yet you'll see a loading spinner, and if none are offered you'll see "No camera movement options are available yet. Please check back soon."
Camera movement is only available on the Ultra quality tier. On lower quality, the section is masked and a hint tells you "Camera movement is only available in Ultra" — clicking the mask switches you to Ultra so you can use it.
What does the Quality setting do?
Quality picks the render tier for your avatar. Talking avatar offers two tiers: Best and Ultra. Ultra is the higher-fidelity tier and is the one that unlocks Camera movement; it also affects the per-second cost. You'll find the Quality control in the bottom bar next to Generate.
How much does a talking avatar cost?
It's billed per second of audio. The bottom bar shows the per-second rate, the duration of your audio, and the total estimate next to Generate before you generate. Ultra costs more per second than Best. The exact amount is shown next to Generate and depends on your plan and the tier you choose.
Why can't I generate yet?
Generate is disabled until both inputs are ready. You'll be blocked if: there's no audio, the audio is still uploading or failed, there's no reference image, the image is still uploading or failed, or the audio is longer than the 3-minute limit. Once the image and audio are both uploaded and within limits, Generate becomes active.
What happens if an upload fails?
If the image fails, you'll see "Please try again later" under the image and Generate stays off until you re-add a working image. If the audio fails, you'll see "Failed to upload the audio. Please try again later." Just remove the failed item and add it again.
On mobile
The layout stacks vertically: Reference image first, then Audio, then the Camera movement section at the bottom. The image and audio inputs work the same (Select content / Upload image, and Generate audio / upload audio). The Quality control lives in the mobile bottom bar. The 3-minute audio limit and per-second pricing are unchanged.
Related: Talking Video · Lip sync · Generate audio (text-to-speech) · Credits & billing