Master AI video prompts from zero. Learn the core formula, constraint words, task-type selection, and how to structure engineering-grade prompts for Seedance, Kling, Sora, and Veo.
Before writing a single prompt, decide what kind of AI video task you are doing. Most models support four task types, and each requires a different sentence structure.
Task 1 — Multi-modal Reference (generate a new video from images, video clips, or audio): use phrases like 'Refer to image 1's character, generate…' or 'Refer to video 1's camera movement, generate…'. Task 2 — Edit Video (modify elements in an existing video): use 'Strictly edit video 1, change the background to…'. Task 3 — Extend Video (continue a clip forward or backward in time): use 'Extend video 1 forward, generate…'. Task 4 — Combo (combine reference and edit): use 'Refer to image 1's style, strictly edit video 2…'.
Once you know the task type, structure every prompt around eight core elements. Think of this as an engineering spec, not a creative description.
Element 1 — Precise Subject: use 2-3 stable static features (clothing, hairstyle, appearance) to lock identity. Avoid pronouns; always tag each character clearly. Element 2 — Action Details: describe movement at the body-part level with amplitude, speed, and force (slowly raise a hand, quickly turn head, press hard against the ground). Element 3 — Scene Environment: name the location and spatial relationship. Element 4 — Lighting & Color: specify time of day, light source type, and color temperature — not just 'good lighting' but 'golden hour backlight through a dusty window'.
Element 5 — Camera & Movement: pick one shot size (wide, medium, close-up) and one movement (dolly, pan, orbit). Do not stack multiple camera moves in one shot. Element 6 — Visual Style: name a concrete reference — '1980s indie film, 16mm pushed' works better than 'cinematic'. Element 7 — Quality: define sharpness, texture, and fidelity. Element 8 — Constraints (negative prompt): explicitly exclude what you do not want — watermarks, logos, subtitles, face distortion.
Constraint words are not optional. They define the generation boundary and prevent common failures. Always add these three base constraints to every prompt: no watermarks, no logos, and no subtitles.
For video stability, add: character faces remain stable and undistorted, body proportions remain stable, movement is smooth and continuous with no stutter or flicker. For style consistency, add: maintain a consistent style throughout, no style drift, tone remains uniform. For character identity, add: do not generate identical-looking characters or duplicate the same character in frame. For audio, add: audio is natural with no clipping, audio fades out naturally at the end.
Some constraint pairs cancel each other out. Writing both '8mm film grain aesthetic' and '4K ultra-sharp' in the same prompt causes the model to pick one randomly. Other broken pairs: 'film grain' + 'ultra sharp', 'cinematic film look' + 'handheld documentary' written with equal weight, naming more than two artist styles simultaneously, and 'slow motion' + 'high-speed action'.
Complex stories need multi-shot breakdowns. Write one prompt per shot, treat the shot list as a timeline, and carry continuity anchors — wardrobe, props, lighting, location — across every shot. For models that support timestamps (Seedance, Kling), add shot time ranges like 'Shot 1 [0s-5s]: ... Shot 2 [5s-10s]: ...' to give the model explicit temporal structure.
When one shot output looks great, capture its last frame as the start frame for the next shot. This start-frame chaining is the most reliable way to build long, consistent sequences.