Video Prompt Markup Language
VPromptML — an XML-like markup language for structuring AI video prompts.
Storyboard-readable. Parser-ready. Built for AI video generation.
Overview
VPromptML is an XML-like markup language for structuring AI video prompts: timed lyrics, scene prompts, opening-frame prompts, camera movement, characters, continuity, emotional arcs, and generation instructions.
It is not a video rendering language. It is a pre-production prompt specification language that describes what should be generated before any assets exist — what the AI should create, what the opening frame should look like, how the shot should move, and how one scene continues from another.
Positioning
VPromptML is designed for:
- AI music video generation
- Lyric-synchronized scene prompting
- Cinematic storyboard prompting
- Opening-frame generation
- Video-motion prompt generation
- Shot continuity & character consistency
- Metaphor-driven visual storytelling and emotional arc planning
Traditional video markup formats describe existing media assets — clips, images, audio, subtitles, overlays, layers, timelines. VPromptML instead describes intent before generation.
File extensions
Use .vprompt.xml for public, readable, XML-compatible documents. Use .vpml only as a compact internal extension.
gana.vprompt.xmlRoot element
Every VPromptML document must have exactly one root element: <vprompt>.
<vprompt version="0.1" type="musicVideo" lang="lt"
aspectRatio="16:9" resolution="1920x1080">
...
</vprompt>Required attribute: version — the specification version.
Optional attributes: type (musicVideo, shortFilm, ad, trailer, lyricVideo, storyboard), lang, aspectRatio, fps, resolution.
Document structure
The recommended top-level structure is:
<vprompt version="0.1" type="musicVideo" lang="lt" aspectRatio="16:9">
<meta> ... </meta>
<project> ... </project>
<globalPrompt> ... </globalPrompt>
<characters> ... </characters>
<scenes> ... </scenes>
</vprompt>Required: <globalPrompt> and <scenes>. Recommended: <meta>, <project>, and <characters>.
meta
Stores technical and document-level metadata. Allowed children: title, author, created, updated, audio, targetFormat, notes.
<meta>
<title>Gana</title>
<author>Ilja Laurs</author>
<created>2026-06-15</created>
<audio default="Gana v1 20260613.mp3" />
<targetFormat aspectRatio="16:9" resolution="1920x1080" fps="30" />
</meta>project
Defines the creative project in human-readable form. Allowed children: name, type, summary, theme, mainMetaphor, emotionalArc, targetAudience, creativeNotes.
<project>
<name>Gana</name>
<type>cinematic music video</type>
<summary>
A mature tired man crosses a frozen lake filled with frozen men.
He almost freezes too, but at the word "Gana" becomes an iron human.
</summary>
<theme>Refusal to remain obedient, silent, and emotionally frozen.</theme>
<mainMetaphor>A frozen lake as emotional numbness and obedience.</mainMetaphor>
<emotionalArc>
Numbness -> pressure -> inner cracking -> refusal -> iron self-possession.
</emotionalArc>
</project>globalPrompt
Defines global creative instructions applied to all scenes unless locally overridden. Recommended children: storyWorld, visualEvolution, style, rules, negativePrompt, renderingNotes.
<globalPrompt>
<storyWorld>
Cinematic dark emotional music video. Main metaphor: a vast frozen
lake at winter dawn, filled with frozen human figures.
</storyWorld>
<visualEvolution>
First numbness and waiting; then ice pressure and inner cracking;
then the decisive transformation.
</visualEvolution>
<style>
16:9 cinematic frame, full-body shots preferred. Cold blue-gray dawn,
black ice, white frost, fog, weak sun on horizon.
</style>
<rules>
<rule>No random scenery.</rule>
<rule>Every scene must advance the story.</rule>
</rules>
<negativePrompt>
No modern cars. No smiling extras. No unrelated fantasy creatures.
</negativePrompt>
</globalPrompt>characters
Defines recurring characters, symbolic figures, and identity continuity. Each <character> requires a unique id and may declare a role. Recommended children: name, appearance, costume, acting, symbolicMeaning, continuityRules.
<characters>
<character id="singer" role="main">
<name>Mature tired male singer</name>
<appearance>Dark coat, worn face, deep eyes, restrained acting.</appearance>
<symbolicMeaning>A man almost frozen by silence and obedience.</symbolicMeaning>
</character>
<character id="frozenMen" role="symbolicCollective">
<name>Frozen men</name>
<appearance>Male human figures frozen into ice statues.</appearance>
<symbolicMeaning>Monuments of obedience, silence, and emotional death.</symbolicMeaning>
</character>
</characters>scenes & scene
<scenes> contains one or more <scene> elements in chronological order, each with a unique id.
<scenes>
<scene id="s001" start="00:00" end="00:07" audio="Gana v1 20260613.mp3">
...
</scene>
<scene id="s002" start="00:08" end="00:15" audio="Gana v1 20260613.mp3">
...
</scene>
</scenes>A <scene> defines one timed AI video generation segment. Required attributes: id, start, end. Optional: audio, duration, type (intro, verse, chorus, bridge, instrumental, climax, outro, transition).
<scene id="s001" start="00:00" end="00:07"
audio="Gana v1 20260613.mp3" type="intro">
<lyric type="lyric">Ilgai tylėjo, galvą nuleidęs.</lyric>
<intent>Establish numbness and the frozen world.</intent>
<charactersPresent>
<ref character="singer" />
<ref character="frozenMen" />
</charactersPresent>
<emotion start="numbness" end="recognition">
He begins to understand he is becoming one of the frozen men.
</emotion>
<openingFrame continuity="new" shot="wide" frame="fullBody">
Wide cinematic still frame, winter dawn over a vast frozen lake,
black ice, cold fog, weak blue-gray sunrise. A tired man stands
full-body, head lowered, among countless frozen figures.
</openingFrame>
<camera movement="pushIn" shot="wide">Slow push from behind.</camera>
<motionPrompt>
The camera slowly pushes toward the man from behind. His coat moves
in cold wind. The frozen men remain completely still around him.
</motionPrompt>
<negativePrompt>No modern buildings. No cheerful lighting.</negativePrompt>
</scene>Scene elements
Required scene children are <lyric>, <openingFrame>, and <motionPrompt>. The rest are recommended or optional.
<lyric>
Lyric, narration, transcript, vocalization, instrumental note, or silence marker. Optional type: lyric | narration | instrumental | vocalization | silence | transcript.
<lyric type="lyric">Gyvenimui leidau spręst už save.</lyric>
<lyric type="instrumental">[No speech detected]</lyric>
<lyric type="vocalization">Ooh-ooh-ooh</lyric><intent>
Defines the dramatic purpose of the scene. Useful for human review, prompt generation, and validation — not necessarily sent directly to a generator.
<openingFrame>
Defines the first visual frame of the generated shot (video-native; replaces the older "opening image"). Required continuity: new | extend. Optional shot (wide, medium, closeup, aerial, lowAngle, highAngle, extremeCloseup), frame (fullBody, portrait, landscape, symbolic, detail), and cameraAngle.
The first scene must use continuity="new". Use continuity="extend" to continue the previous scene's location, setup, or symbolic environment without contradicting it.
<openingFrame continuity="extend" shot="lowAngle" frame="fullBody">
Same frozen lake, same cold dawn, continuous from previous shot.
Low-angle frame near the man's boots as he begins walking.
</openingFrame><motionPrompt>
Defines motion, camera behavior, acting, transformation, and visual development from the opening frame — camera movement, subject movement, environmental motion, symbolic transformation, and the end state of the shot.
<charactersPresent>
Lists characters visible in the scene via <ref character="id" />. Each ref should match a defined character; undefined references should trigger a validation warning.
<emotion>
Defines emotional state or transition. Optional start and end attributes capture the emotional arc within a single scene.
<camera>
Technical/cinematic camera direction, separate from the motion prompt. Optional movement (pushIn, pullBack, track, dolly, crane, aerial, static, handheld) and shot.
<continuity>
Optional explicit continuity explanation with a mode of new | extend. May provide a more detailed explanation than the openingFrame continuity attribute.
<negativePrompt>
Defines forbidden content, either globally (inside <globalPrompt>) or per-scene.
<export>
Optionally links scene instructions to generated assets or downstream rendering formats via a target and one or more <asset /> children.
<export target="vsml">
<asset type="openingFrame" src="./generated/s001.png" />
<asset type="video" src="./generated/s001.mp4" />
</export>Timestamps
Preferred format is MM:SS (e.g. start="01:52"); the extended HH:MM:SS form is also allowed. Rules:
- start must be earlier than end
- scenes should normally be chronological
- overlaps should be flagged unless explicitly allowed
- gaps should be flagged unless intentional
Naming & text rules
Attributes use camelCase where applicable (aspectRatio, openingFrame, motionPrompt). Avoid snake_case, PascalCase, or kebab-case. Short attributes (id, type, role, lang, start, end, audio, fps) are allowed as-is.
Text inside elements may be natural language spanning multiple lines. Escape literal <, >, and &, and preserve lyric line breaks where timing or phrasing matters. Self-closing tags are allowed for references and empty technical metadata (e.g. <ref character="singer" />).
Strict scene template
Recommended production scene template:
<scene id="" start="" end="" audio="" type="">
<lyric type=""></lyric>
<intent></intent>
<charactersPresent>
<ref character="" />
</charactersPresent>
<emotion start="" end=""></emotion>
<openingFrame continuity="" shot="" frame=""></openingFrame>
<camera movement="" shot=""></camera>
<motionPrompt></motionPrompt>
<negativePrompt></negativePrompt>
<export target="">
<asset type="" src="" />
</export>
</scene>Canonical example
<vprompt version="0.1" type="musicVideo" lang="lt"
aspectRatio="16:9" resolution="1920x1080">
<meta>
<title>Gana</title>
<author>Ilja Laurs</author>
<audio default="Gana v1 20260613.mp3" />
<targetFormat aspectRatio="16:9" resolution="1920x1080" fps="30" />
</meta>
<project>
<name>Gana</name>
<type>cinematic music video</type>
<emotionalArc>Numbness -> refusal -> iron self-possession.</emotionalArc>
</project>
<globalPrompt>
<storyWorld>
A vast frozen lake at winter dawn, filled with men turned to ice.
</storyWorld>
</globalPrompt>
<characters>
<character id="singer" role="main">
<name>Mature tired male singer</name>
</character>
</characters>
<scenes>
<scene id="s001" start="00:00" end="00:07" audio="Gana v1 20260613.mp3" type="intro">
<lyric type="lyric">Ilgai tylėjo, galvą nuleidęs.</lyric>
<openingFrame continuity="new" shot="wide" frame="fullBody">
Wide still frame, winter dawn over a vast frozen lake.
</openingFrame>
<motionPrompt>
The camera slowly pushes toward the man from behind.
</motionPrompt>
</scene>
</scenes>
</vprompt>Minimal valid document
<vprompt version="0.1" type="musicVideo" lang="lt" aspectRatio="16:9">
<globalPrompt>
<storyWorld>
Cinematic emotional music video. One man walks across a frozen
lake and slowly transforms from numbness into strength.
</storyWorld>
</globalPrompt>
<scenes>
<scene id="s001" start="00:00" end="00:07" audio="song.mp3">
<lyric>He lowers his head.</lyric>
<openingFrame continuity="new" shot="wide" frame="fullBody">
Wide still frame of a man standing alone on a frozen lake at dawn.
</openingFrame>
<motionPrompt>
The camera slowly pushes toward him as cold fog moves across the ice.
</motionPrompt>
</scene>
</scenes>
</vprompt>JSON equivalence
Every VPromptML document should be convertible into JSON, so it can stay readable like XML while being transformable into structured data for prompt engines, validators, and exporters.
<scene id="s001" start="00:00" end="00:07" audio="song.mp3">
<lyric>He lowers his head.</lyric>
<openingFrame continuity="new">Wide frozen lake.</openingFrame>
<motionPrompt>Camera pushes forward.</motionPrompt>
</scene>Equivalent JSON
{
"id": "s001",
"start": "00:00",
"end": "00:07",
"audio": "song.mp3",
"lyric": { "text": "He lowers his head." },
"openingFrame": { "continuity": "new", "text": "Wide frozen lake." },
"motionPrompt": { "text": "Camera pushes forward." }
}Validation rules
Document: exactly one <vprompt> root with a version; <globalPrompt> and <scenes> must exist; <scenes> must contain at least one scene.
Scenes: every scene needs id, start, end and must contain <lyric>, <openingFrame>, and <motionPrompt>. Every <openingFrame> needs a continuity attribute; the first scene must be continuity="new"; start must precede end.
References & timing: every <ref character=""> should match a defined character; overlaps and unintentional gaps should trigger warnings.
Prompt quality: opening frames describe a still first frame, motion prompts describe motion from that frame, character identity stays consistent, and the global metaphor remains visible across the sequence.
Rendering pipeline
Recommended workflow from markup to final video:
VPromptML
-> scene prompt extraction
-> opening-frame generation
-> video motion generation
-> generated PNG / MP4 assets
-> video composition timeline
-> final rendered videoDesign principles
- Prompt-first, not asset-first — describe what to generate before assets exist.
- One root document — every file begins with <vprompt> and ends with </vprompt>.
- Scene objects — each scene has timing, lyric, opening frame, and motion prompt.
- Image before motion — openingFrame defines the first frame, motionPrompt what happens next.
- Continuity is explicit — each scene declares new setup or extension of the last.
- Story drives visuals — built for metaphor, emotion, and transformation.
- Human-readable, parser-ready — a human can write it, a machine can validate it.