Seedance 2.0 Reference to Video

Generate cinematic videos from reference images, videos, and audio. Multi-modal input with native audio and up to 15 seconds.

"Beautiful fusion of these two scenes. Mills stand against a rugged coastline, their large wooden wheels turned by the relentless surge of tidal waves combined with a field of wildflowers bathed in soft sunlight transitions into where monarch butterflies take flight.."

Image 1

Image 1
1

Image 2

Image 2
2

Generated Result

Generated
~30-90 seconds

Describe your scene and generate a video in seconds

8,500+ videos generated this month

📄 About Seedance 2.0 Reference to Video
Key Features
Multi-modal input support accepts up to 9 images, 3 videos, and 3 audio files simultaneously for complex video generation scenarios
Native audio generation creates synchronized sound effects, ambient audio, and lip-synced speech that perfectly matches visual content
Extended 15-second duration capability allows for complete narrative sequences and detailed scene development
Flexible aspect ratio options from 21:9 ultrawide to 9:16 vertical support content creation for any platform or device
Advanced temporal consistency ensures smooth motion, stable character appearances, and professional camera movements throughout the video
Intuitive reference syntax using @Image1, @Video1, @Audio1 notation makes complex multi-modal prompts easy to construct
Scene fusion technology seamlessly blends multiple reference inputs into cohesive transitions and visual effects
💡 Use Cases
Social media content creation for Instagram Reels, TikTok, and YouTube Shorts with platform-optimized aspect ratios
Marketing video production combining product images, lifestyle footage, and branded audio for engaging advertisements
Film pre-visualization and storyboard animation using reference images and audio tracks to test scene concepts
Music video generation synchronizing artist images with audio tracks to create performance-style visual content
Educational content development transforming static diagrams and narration into dynamic explainer videos
E-commerce product demonstrations combining multiple product angles with ambient audio for immersive shopping experiences
Character animation bringing still portraits to life with synchronized dialogue and natural movements
🎯 Best For
🎯 Video creators, social media marketers, filmmakers, content agencies, music producers, e-commerce brands, and creative professionals seeking efficient multi-modal video generation
👍 Pros
Accepts multiple input modalities simultaneously for unprecedented creative flexibility
Generates synchronized audio automatically, eliminating separate editing workflows
Produces up to 15 seconds of coherent video with consistent quality throughout
Supports seven aspect ratios for optimal output across all platforms and devices
Advanced temporal consistency creates professional-quality motion and transitions
Intuitive prompt syntax makes complex multi-modal requests accessible to all skill levels
⚠️ Considerations
Maximum 15-second duration may require multiple generations for longer content needs
Combined video reference duration limited to 15 seconds total across all input files
Audio input requires at least one image or video reference to function
Maximum resolution of 720p may not meet requirements for 4K production workflows
📚 How to Use Seedance 2.0 Reference to Video
1
Upload your reference materials: Add up to 9 images, 3 videos (combined 2-15s), and 3 audio files (combined 15s) to use as creative sources
2
Write your prompt using reference syntax: Describe your desired video using @Image1, @Video1, @Audio1 notation to reference specific uploaded files
3
Configure output settings: Select your preferred aspect ratio (auto to 9:16), resolution (480p or 720p), and duration (4-15 seconds or auto)
4
Enable audio generation: Toggle the generate_audio option to create synchronized sound effects and ambient audio that matches your visual content
5
Generate and refine: Click generate to create your video, then iterate by adjusting prompts or settings to perfect your result
Frequently Asked Questions
Seedance 2.0 supports up to 12 total reference files across all modalities: maximum 9 images (30MB each), 3 videos (50MB total, 2-15s combined duration), and 3 audio files (15MB each, 15s combined duration). You reference these files in your prompt using @Image1, @Video1, @Audio1 syntax to control how they're incorporated into the final video.
When enabled, the audio generation feature automatically creates synchronized sound effects, ambient environmental sounds, and even lip-synced speech that matches the visual content. This eliminates the need for separate audio editing and ensures your videos have professional-quality sound design that perfectly complements the generated visuals.
Yes, you can specify durations from 4 to 15 seconds, or use the 'auto' setting to let the model determine the optimal length based on your prompt and reference materials. The auto setting analyzes your content complexity and narrative requirements to choose an appropriate duration that fully realizes your creative vision.
Use 9:16 vertical for Instagram Reels, TikTok, and YouTube Shorts; 16:9 widescreen for YouTube videos and presentations; 1:1 square for Instagram feed posts; and 21:9 ultrawide for cinematic content. The 'auto' setting analyzes your reference materials and selects the most appropriate ratio based on their dimensions and your prompt context.
The model intelligently analyzes and blends multiple video references, understanding motion patterns, visual styles, and scene dynamics from each input. It can create seamless transitions between different video clips, fuse visual elements, or maintain consistent motion characteristics across the generated output, depending on how you reference them in your prompt.

More Video Generation Models

Vidu Image to Video
Animate images with precise motion control and customizable intensity
Google Veo 3 Image-to-Video
Turn images into videos with sound.
PixVerse v4.5 Image-to-Video
Turn images into 8s videos in 1080p
LTX-2 19B Image to Video LoRA
Animate images into videos with audio and custom style control.
Wan v2.6 Reference to Video Flash
Create videos with consistent characters using reference images. Multi-shot support, 5-10s clips.
Sora 2 Pro Text-to-Video
Create cinematic 1080p videos with audio from text, superior quality.
Kling Video v2.6 Pro Image to Video
Animate images into cinematic videos with dialogue and sound effects.
Grok Imagine Video Text to Video
Create 15-second videos with synchronized audio from text descriptions.
Runway Gen-4 Turbo
Create 5-10s videos with consistent characters and realistic motion