Text to Video turns a written description into a generated video clip — no source image or video needed. You write a prompt describing the pose, movement, scene, and any elements you want, then generate.
What does Text to Video do?
It creates a video from your words alone. You describe what should happen — for example a pose, a movement, or any element you want in the shot — and the model generates a matching clip. The prompt field shows the placeholder "Enter pose, movement, or any element you want in the video." to get you started.
How do I generate a video from text?
Type your description into the prompt editor.
Use the controls on the left (the tag sidebar) to add an art style or a camera movement, and to set up elements.
Pick your quality (Fast, Ultra, or Ultra S), resolution (FHD by default), and duration from the workstation's bottom bar.
Click Generate. The credit cost is shown next to the Generate button.
What is the prompt box and is there a length limit?
The prompt box is a rich editor where you type your description and drop in tags (camera movement, art style, model person indicator, and elements). There is a per-shot character limit — when you paste or type past it, a notice tells you your prompt "exceeded the character limit and was truncated," and a live counter appears near the bottom-right of the editor showing how many characters you've used out of the maximum. The visible text isn't blocked, but only the text within the limit is actually used to generate.
What is "Native audio"?
"Native audio" is a toggle in the header that generates sound together with your video. It's available on every Text to Video quality (Fast, Ultra, and Ultra S), but it behaves differently on the highest tier:
On Ultra S quality, native audio is required and stays on — if you try to switch it off you'll see "Native audio is required for Ultra S quality."
On Fast and Ultra you can freely turn it on or off.
What are art style and camera movement, and why are they in the sidebar?
The left sidebar holds the shared tag controls. Art style sets the overall look and camera movement adds motion like a pan or zoom. A hint explains the rule: "Each shot allows one camera movement selection, while art style is set once globally in the first shot only." So you can choose a different camera movement per shot, but the art style applies to the whole video and can only be set on the first shot.
Why can't I change the art style on a later shot?
Art style is global and is taken from the first shot only. If you try to set it on a second or later shot, you'll see "Art style can only be added at the first shot." Select the first shot to change the art style for the whole video.
What are Elements and why are they greyed out?
Elements are reusable people, products, or backgrounds you can reference inside your prompt. They are only available on the highest tier — if your quality doesn't support them you'll see "Only available for Ultra S." Switch to Ultra S to unlock them. See Elements for the full picture.
How do multi-shot videos work?
When your quality supports it, you can build a video out of several shots, each with its own prompt and camera movement. Click Add shot to add another. The number of shots you can add depends on your video duration — a hint reads "{seconds} second videos support up to {shots} shots. For more shots, increase the video duration." Multi-shot is only offered on the higher tiers ("Multi-shot generation is supported for Ultra and Ultra S quality.").
How do I remove a shot?
Each added shot has a small header showing "Shot N." Click that header to remove the shot. The remaining shots renumber automatically.
Why is the sidebar greyed out with "Please select a shot to edit"?
In multi-shot mode the tag sidebar applies to whichever shot is active. If no shot is focused, the sidebar dims and a tooltip says "Please select a shot to edit." Click into a shot's prompt editor to make the sidebar controls apply to it.
How much does it cost?
Video is priced per second of output, so the cost scales with your duration. The exact per-second rate is shown next to the Generate button and depends on your plan and the quality you choose (it also reflects options like resolution and native audio). For example, a live capture showed a rate of 160 credits per second, so a 5-second clip cost 800 credits. See Credits & billing.
On mobile
Text to Video works on mobile with the same prompt editor and tags, laid out for a narrow screen. The quality, resolution, and duration controls live in the workstation's bottom bar rather than alongside the prompt.
Related: Quality, resolution & duration · Image to Video · Elements · Credits & billing