Seedance 2.0 Image to Video Prompts: Lock the Look
How to write Seedance 2.0 image to video prompts. Lock wardrobe, character, set, and palette with a reference image and let the prompt drive the action.

Why text to video has a consistency ceiling
Seedance 2.0 image to video prompts are the only reliable way to keep a character, wardrobe, and palette identical across an entire ad set. Two text only generations from the same prompt will produce two slightly different versions of the same character, the same set, the same wardrobe. For a single hero clip that is fine. For five variations of the same brand creator, it falls apart. The wardrobe drifts. The face changes. The colors shift. The brand identity breaks.
Image to video flips the contract. You upload a reference image. Seedance 2.0 uses that image as the first frame. Your text prompt drives motion, action beats, dialogue, and timing. The wardrobe is locked because it is in the image. The face is locked. The set is locked. The palette is locked.
This guide walks through the structural difference between text to video and image to video prompts, gives you four image to video patterns for the most common use cases, and shows how the Adidas reference prompt collapses when you run it as image to video. By the end you will know how to lock the look with the image and drive the action with the prompt.
The principle: image carries the look, prompt carries the motion
Seedance 2.0 image to video prompts work best when the reference image carries the look and the text prompt carries the motion. Upload a still that locks wardrobe, set, character design, and palette as the first frame, then write a 50 to 80 word prompt covering only camera move, action beats, dialogue, and the closing negative cue. Re describing the image in the prompt wastes budget and confuses the render.
This division of labor is the whole point. The image is the visual scaffolding. The prompt is the motion direction. If you confuse the two, both jobs get worse. A long descriptive prompt that repeats everything in the image wastes rendering budget. An image with a vague prompt and no motion cues produces a clip that just sits there.
The right ratio is short prompt, rich image. Where a text to video prompt might be 150 words, an image to video prompt is often 50 to 80 words. The savings come from not having to describe the wardrobe, the set, the lighting, or the character. Those are already in the image.
What your image to video prompt should always include: the camera move (slow dolly in, tracking, handheld), the action beats (steps forward, lifts the bottle, smiles), any dialogue lines in quotes, the timing (over four seconds, in the final beat), the negative cue at the end. That is it.
What you should leave out: wardrobe descriptions, set descriptions, character age, gender, build, hair, lighting source, palette. The image already knows all of those things and re describing them sometimes confuses the render.
Pattern 1: Product hero from a still photo
The simplest pattern. Upload a product photo, write a short motion prompt.
Reference image: ceramic coffee mug on a slate stone slab in a minimalist studio.
Slow dolly in from a wide framing to a medium close up over four seconds, shallow focus on the mug rim, steam slowly rising from the rim, soft window light remains constant. Filmed in 720p. - No music, no logo, no text on screen.
Why this works: the image is doing the heavy lifting. The mug, the slab, the studio, the lighting are all locked. The prompt only describes the camera move (slow dolly in over four seconds), the focus shift (shallow DOF on the rim), one ambient effect (steam slowly rising), and the lighting continuity instruction (soft window light remains constant). Total prompt is 35 words.
This pattern is the fastest way to convert e commerce product photos into motion creatives without paying for studio video time. Paste this into VIDEO AI ME with one of your own product shots and watch the still come alive in under a minute.
Pattern 2: Talking head from a creator photo
Upload a portrait of a creator, write a dialogue prompt that uses their face.
Reference image: woman in her thirties sitting at a kitchen table with a coffee, looking at camera, soft daylight from a window on her left.
She smiles, takes a sip from the coffee, then looks back at camera and says: "Day forty seven and I genuinely look forward to this every morning." Filmed with iPhone, handheld, slight bounce. - No music, no logo, no text on screen.
Why this works: the character, the table, the kitchen, and the lighting are locked by the image. The prompt drives three beats (smiles, takes a sip, looks back) and one line of dialogue. The handheld cue reinforces the iPhone aesthetic on top of the photo.
The key trick is referencing the image's lighting in your continuity instructions. By saying soft daylight remains constant, or matching the implied light direction in your beats, you keep the lighting from drifting between frame one and frame thirty.
Pattern 3: Adidas sneaker reveal from a product photo
The Adidas reference prompt could be run as image to video by uploading a high resolution shot of the sneakers first. Here is how the prompt shrinks.
Reference image: a young man holding a pair of white and neon green sneakers in a concrete skatepark at golden hour.
He lifts the sneakers close to the camera, rotates them slowly saying: "Bro look at these. Feel that material." He drops them on the ground, slides his foot in, stomps twice, then jogs three steps and stops. He turns back to camera: "Insane comfort." Handheld iPhone aesthetic, slight lens flare. - No music, no logo, no text on screen.
Why this works: the character, the sneakers, the skatepark, and the golden hour lighting are all locked by the image. The prompt is the same beat sequence and dialogue as the original text to video version, minus the scene description that was now redundant. The result holds the brand identity (the exact sneaker design) more reliably across multiple generations.
This is the technique to use when you are running a real product through Seedance 2.0 and you cannot afford the look to drift.
Pattern 4: Lookbook to motion conversion
Fashion brands have entire lookbooks full of static images. Image to video turns each one into a motion clip in seconds.
Reference image: woman in a beige trench coat and dark jeans standing on a tree lined city sidewalk in autumn afternoon light.
She takes four casual steps toward the camera, slows down, places one hand in her pocket, turns her body slightly to her right, glances back over her shoulder with a small smile. Slow tracking shot from a low angle. Filmed in 720p, handheld. - No music, no logo, no text on screen.
Why this works: the wardrobe, the location, the season, and the time of day are all in the image. The prompt focuses entirely on motion (four steps, hand in pocket, body turn, glance back) and one slow tracking shot from a low angle. The beige trench coat is rendered identically across every generation because it is in the reference image, not in the prompt.
This is the cleanest way for fashion brands to ship a motion ad set from an existing photo shoot in under an hour. Open VIDEO AI ME and run the prompt with your lookbook photos to test it on a real campaign.
Pattern 5: Founder selfie to motion testimonial
For founders who already have one strong headshot but want a moving testimonial clip, this is the cleanest pattern.
Reference image: founder in his late thirties wearing a navy quarter zip pullover sitting at a sunlit desk, looking directly at the camera.
He holds the camera gaze for one beat, then says: "I built this because I was tired of paying agencies for something I could do in twenty minutes." He gives a small confident smile, leans back. Filmed with iPhone, locked tripod, slight breathing motion. - No music, no logo, no text on screen.
Why this works: the desk, the wardrobe, and the office lighting are all locked by the headshot. The prompt drives one beat (hold gaze), one dialogue line, and one micro action (lean back). The mid close-up framing carries through from the reference image because the prompt does not contradict it.
This pattern also works for any personal brand: coaches, consultants, course creators. The only thing that needs to change per testimonial is the dialogue line.
Common image to video prompt mistakes
- Repeating the image in the prompt. If the wardrobe and the set are in the image, do not describe them again.
- Writing prompts that are too long. Image to video prompts should be 50 to 80 words. Anything longer wastes budget.
- Forgetting the camera move. The image only sets the first frame. Without a camera move cue the clip just sits there.
- Skipping the action beats. The image is static. The prompt needs verbs for motion to happen.
- Ignoring the lighting continuity. Mention soft daylight remains constant or warm key holds throughout to prevent the lighting from drifting.
- Uploading a low resolution image. The first frame is locked to your image. Low resolution input becomes a low resolution output.
- Mixing reference images with conflicting palettes inside one ad set. Pick one frame as the master and remix from there.
How to apply this on VIDEO AI ME
Image to video runs as a separate mode inside the Seedance 2.0 generator on VIDEO AI ME. Pick image to video, upload your reference image, paste your motion prompt, set the aspect ratio to match the image, and hit generate. For ad sets where you need the same character in five different scenarios, generate the character once with text to video, save the best frame as a reference image, and reuse it across the rest of the campaign. The 300+ AI actor library, voice cloning, and 70+ language support all stack on top of image to video the same way they do on text to video.
Wrapping up
Image to video is the technique to reach for whenever consistency matters more than creative freedom. Upload the image, write a short prompt focused on motion and dialogue, end with the negative cue, ship. Use it for product hero shots, talking heads from stock photos, brand sneaker reveals, and lookbook to motion conversions. Try Seedance 2.0 free on VIDEO AI ME and lock your first character today.
More Seedance 2.0 prompts to study
The four reference videos used throughout this guide (a multi shot street interview, a skatepark product UGC, an unboxing narrative with a timelapse, and a high energy gamer reaction) live as a full copyable library on Seedance 2.0 Prompt Templates: Copy Paste and Ship. Bookmark it and remix any of the four when you need a starting point.
Related Seedance 2.0 guides on VIDEO AI ME
If you want to go deeper, these guides pair well with this one:
- Seedance 2.0: Complete Guide for AI Video Creators
- Seedance 2.0 vs Seedance 1: What Actually Changed
- Seedance 2.0 Features: Everything the New ByteDance Model Can Do
- How to Use Seedance 2.0: Beginner to Advanced in One Guide
You can also browse the full VIDEO AI ME blog for more AI video tutorials, or jump straight into the product and try Seedance 2.0 free on VIDEO AI ME with no credit card.
Frequently Asked Questions
Share
AI Summary

Paul Grisel
Paul Grisel is the founder of VIDEOAI.ME, dedicated to empowering creators and entrepreneurs with innovative AI-powered video solutions.
@grsl_frReady to Create Professional AI Videos?
Join thousands of entrepreneurs and creators who use Video AI ME to produce stunning videos in minutes, not hours.
- Create professional videos in under 5 minutes
- No video skills experience required, No camera needed
- Hyper-realistic actors that look and sound like real people
Get your first video in minutes
Related Articles

Seedance 2.0 Negative Prompts: What to Tell the Model NOT to Do
How to write Seedance 2.0 negative prompts that strip out music, logos, captions, and stock library leaks. Real examples and the universal closing line.

Seedance 2.0 Best Settings: The Configuration That Works
Seedance 2.0 best settings: 720p, 9:16 for short-form social, locked aspect ratio, iPhone aesthetic anchor. Here is the full configuration we use for production work.

Seedance 2.0 Character Consistency: Same Person Across Shots
Seedance 2.0 consistency keeps the same character across multiple shots and clips. Here is how to anchor a face, lock wardrobe, and use reference images for full control.