Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation. Supports multi-shot videos with customizable prompts and durations (3-15 seconds)
"Cinematic drone shot through ancient ruins at golden hour"
Fill in the parameters below and click "Generate" to try this model
Text prompt for single-shot video (don't use with multi_prompt)
Multi-shot video generation with custom prompts per shot
Video duration (for single-shot only)
Video aspect ratio
Generate native audio (Chinese/English, auto-translates others)
Voice IDs (max 2). Reference as <<<voice_1>>>, <<<voice_2>>>
Multi-shot generation type
Negative prompt
CFG scale (prompt adherence)
Your inputs will be saved and ready after sign in
Create premium talking avatar videos with higher quality than Standard.
Replace characters in videos while keeping original lighting and scene intact.
Quickly animate images into videos with sound. Now 50% cheaper.
Quickly create videos from text (optimized for speed and cost)
Audio-driven talking avatar generation without custom image. Creates super-realistic, lip-synchronized videos with natural dynamics from audio input only
Create videos with sound from text quickly and affordably.
Combine multiple images into a single 5-second video with creative or precise blending.
Transfer movements from a reference video to any character image. Pro mode delivers higher quality output, ideal for complex dance moves and gestures
Vidu's latest Q3 Pro model for image-to-video generation. Creates videos up to 16 seconds with optional audio generation from a single image (max 2000 character prompts)
Hey! Need help? 👋
Click to chat with us