Audio-driven video generation for two people. Creates super-realistic, lip-synchronized videos with natural dynamics. Perfect for conversations and dialogues with dual audio support
~30-60 seconds
Fill in the parameters below and click "Generate" to try this model
Image containing two speakers
Audio file for person 1 (left side)
Audio file for person 2 (right side)
Text prompt to guide video generation
Negative prompt to avoid unwanted elements
Audio combination mode (parallel=simultaneous, sequential=person 1 then 2)
Bounding box for person 1 (JSON format, defaults to left half)
Bounding box for person 2 (JSON format, defaults to right half)
Video resolution (480p=1 unit/sec, 720p=4 units/sec)
Video segments (1st=~5.8s, additional=5s each)
Number of inference steps
Text guidance scale for classifier-free guidance
Audio guidance scale (higher=exaggerated mouth)
Your inputs will be saved and ready after sign in
Generate crisp 720p videos from text 10x faster than traditional methods
Create cinematic 720p videos with audio from text, up to 12 seconds long.
Apply 190+ motion templates to your images including dances, transformations, and effects.
Quickly generate videos from text. Perfect for rapid prototyping and content creation.
Create stylized video clips from text with advanced style options.
Generate smooth, cinematic videos from text with precise motion control.
Animate images into stylized videos using text prompts.
Quickly create videos from text (optimized for speed and cost)
Animate images with superior motion quality and ending frame control
Hey! Need help? 👋
Click to chat with us