A voice model is a reusable voice you can apply when turning text into speech — for talking videos, audio clips, and storyboard narration. You can build one from a real audio sample you upload or from a text description using the AI voice generator, then pick it from any "voice model" selector across the app.
What is a voice model and where do I use it?
A voice model is a saved voice that the app uses whenever it reads a script aloud for you. Once you've created one, it shows up in the Voice model selectors in the Generate Audio dialog, in the audio section of talking/image-to-video generation, and in the storyboard editor for Storyboard to video. You pick a voice model, type or generate a script, and the app speaks it in that voice.
How do I create a voice model?
Open the Create Voice Model dialog (for example, the Create Voice Model button next to a voice selector, or the + Create Voice Model button at the bottom of a voice-picker list). At the top you choose one of two starting points:
Link to Portrait Model — ties the voice to one of your existing portrait models, so the voice and that character travel together. The voice model takes the portrait's name automatically.
Create Voice-Only Model — makes a standalone voice with its own name and visibility, not attached to any portrait.
Then you give the voice some source audio (upload a clip or use the AI voice generator), fill in the details (gender, language, age, description), and press Create Voice Model.
What's the difference between "Link to Portrait Model" and "Create Voice-Only Model"?
Link to Portrait Model attaches the new voice to a portrait model you already have. Its description says "Upload your own image as your model's reference image," and once linked the card shows "This voice model is linked to an existing portrait model." You can use Switch Portrait Model to pick a different one or Cancel Portrait Link to back out. There's no separate name field — the voice inherits the linked portrait's name.
Create Voice-Only Model makes a standalone voice "with its own name and visibility." You type a Voice-only model name (up to 32 characters) and set a Public toggle to decide whether others can see it.
Picking either option expands that card; clicking the smaller card on the other side switches you over, and Cancel returns you to the two-card chooser.
How do I give the voice its sound — upload vs AI voice generator?
Under Select voice you have two ways to provide the source voice:
Upload audio — upload a recording of the voice you want to clone. Use a clip between 10 and 90 seconds (the dialog reminds you: "Use audio between 10 and 90 seconds"). On mobile the hint is "Please use audio under 90 seconds."
AI voice generator — describe the voice in words and let the app design one for you, then pick your favorite take as the reference. See "How does the AI voice generator work?" below.
Once a voice is in place you'll see an audio card you can play; the trash/delete icon removes it so you can choose a different one.
How does the AI voice generator work?
The AI voice generator designs a voice from a written description instead of an upload. You set:
Language — pick the language for the generated voice.
Voice model description — describe the voice you want ("Describe the voice you want."). Use Generate from profile to have the app draft a description for you.
Quantity — choose how many takes to generate at once: 1, 2, or 3.
Press Generate voice model to create the previews. Each result appears as Voice 1 / Voice 2 / … with a play button, listed under a "Voice preview" section (with a "Voice preview history" section below). Click a voice to select it, then press Use this voice as reference to bring it back into the Create Voice Model form as the source voice. Generating voice previews uses credits; the amount depends on your plan and how many takes you generate.
Why won't the AI voice generator let me generate?
The Generate voice model button isn't greyed out — it stays clickable. But if your description is missing or shorter than the minimum, clicking it opens a "Description is too short" dialog telling you it must be at least 50 characters and suggesting the Generate from profile button to help. Fill in (or expand) the description and try again.
The generator screen keeps it simple — just Language, Voice model description, and Quantity; there's no separate "voice script" field to fill in here.
What details do I set on a voice model, and are they required?
In the voice model form you can set:
Gender, Language, and Age — choose from the provided options. These act as tags so the voice is easier to find and filter later.
Description — up to 512 characters describing the voice. The Expand description button helps flesh out a short description into a fuller one.
For a voice-only model you also set the Voice-only model name (up to 32 characters) and the Public toggle. To save, a create needs a starting mode picked and a source voice (uploaded or generated); a linked-portrait voice also needs a portrait selected. If something's missing when you press the button, the app shows a message telling you what to add.
Can I preview a voice before I use it?
Yes. In any voice-picker list, each voice model card has a play button so you can hear a sample before choosing it. While editing a voice model that already has a preview, a small waveform-and-play control next to Description lets you play the current voice. In the AI voice generator, every generated take has its own play button.
How do I pick a voice model when generating?
Open a Voice model selector (in the Generate Audio dialog or the audio section of a generation). You'll see a scrollable list of voices with filters at the top:
Gender, Age, and Language dropdowns narrow the list; Clear all resets them.
Only mine limits the list to voices you created (versus community voices).
No voice model (a circle-slash row at the top) clears your selection — "No voice model will be applied." This clear option appears only where a voice is optional (the audio section of a talking / image-to-video generation). The Generate Audio (text-to-speech) dialog requires a voice, so it has no "No voice model" row — you must pick one to generate.
+ Create Voice Model at the bottom opens the create dialog without leaving your flow.
Tap a voice to select it; the picker closes and your choice shows in the selector with its avatar and name.
What do the spinner and "Error" labels on a voice model mean?
A voice model has to finish training before you can use it:
A spinner with a percentage in place of the avatar means that voice is still being generated. If you tap it, you'll see "This voice model is still being generated. Please wait until it's ready."
An error icon with "Error" means training failed. Tapping it shows "Voice model creation failed. Please try creating it again."
Wait for training to finish, or recreate the voice if it failed.
What audio file can I upload, and how long can it be?
Upload an audio file (the picker accepts standard audio files) with a duration between 10 and 90 seconds. Clips outside that range are rejected, and the uploader shows progress while it processes. Pick a clean recording of the voice you want to capture for the best result.
How are voice models used in talking videos and storyboards?
In a talking or image-to-video generation, turning the Audio (Native audio) toggle on reveals a Voice Model selector. In the talking-video audio section the empty selector reads + Voice model (optional), while the image-to-video element column shows + Add voice model. Leave it empty to skip it, or pick a voice so the generated video speaks your script in that voice. The voice model requires that toggle to be on — if it's off you'll see "Turn on Native audio to add a voice model.", and with the toggle off the video has no added voice or native audio.
In the Storyboard to video storyboard editor, the storyboard has its own voice model control alongside the brief, shots, shared Elements pool, and resolution. The voice you choose there narrates the storyboard's shots. See Generate Audio (text-to-speech) and the storyboard pages for how the script and voice come together.
How do I edit or manage a voice model I already made?
Reopen the voice from its actions and choose Edit to open the Update Voice Model dialog, where you can change its name, tags (gender/age/language), description, or linked portrait, then press Save. The dialog opens straight to the matching tab — linked-portrait or voice-only — based on how the voice was created. Editing and deleting are only available on voices you own.
Does creating a voice model cost credits?
Generating voice previews with the AI voice generator uses credits, and training a voice model consumes resources too. The exact amount depends on your plan and the quality you choose, so it isn't a fixed number shown here. Note that creating voice models is a higher-tier feature — the Free (Nano) and Micro plans can't create custom voice models; you need Macro or above (you'll see an upgrade prompt otherwise). See Credits, plans & billing for how credits work and what your plan includes.
On mobile
The create flow is the same but laid out for a phone: the two starting cards (Link to Portrait Model / Create Voice-Only Model) stack vertically, and Select voice opens a bottom sheet with AI voice generator and Upload audio buttons. The voice picker opens as a bottom sheet with the Gender / Age / Language filters and Only mine at the top, the No voice model row, and a Create Voice Model button. The AI voice generator and update dialogs slide in as full-screen panels.