Yuv-Viral-Video
May 24, 2026 · View on GitHub
The signature YUV.AI viral-short pipeline. Two skills working together:
video-use— transcribes the source via ElevenLabs Scribe, picks word-snapped cuts, extracts and concats segments into abase.mp4. This is the cuts + base layout layer.hyperframes— takes the base as a background video and adds the visual composition layer in HTML/CSS/GSAP: liquid-glass cards with realbackdrop-filter, GSAP motion graphics withback.outspringy easing, word-by-word karaoke captions, scene transitions, and renders to MP4 via Chromium.
You MUST use both. Do not invent your own PIL-based cards, do not pre-bake blur into PNGs, do not write your own ffmpeg compositor. The hyperframes skill exists specifically to do the visual layer at Hollywood quality.
Companion skills (read these BEFORE you start)
Both companion skills are installed locally and auto-discovered by Claude Code at session start. You MUST load and apply their guidelines — this skill is the editorial layer on top of them, not a replacement.
| Skill | Path | Owns |
|---|---|---|
video-use | ~/.claude/skills/video-use/SKILL.md (symlink → ~/Developer/video-use) | Transcription (ElevenLabs Scribe), word-snapped cuts, color grade, ffmpeg-correctness (per-segment extract → -c copy concat, 30 ms audio fades, setpts=PTS-STARTPTS+T/TB overlays, output-timeline SRT offsets, transcript caching). Read its "Hard Rules (production correctness — non-negotiable)" section before authoring edl.json. Helpers live at ~/Developer/video-use/helpers/ and are invoked as python helpers/<name>.py. |
hyperframes | ~/.claude/skills/hyperframes/SKILL.md | HTML-as-source-of-truth video compositions, data-* timing attributes, CSS-driven appearance, frame-accurate library-clock animation (GSAP, Lottie, Motion One). Read its companion docs house-style.md, data-in-motion.md, patterns.md, visual-styles.md before writing the composition index.html. |
hyperframes-cli | ~/.claude/skills/hyperframes-cli/SKILL.md | The hyperframes init / lint / preview / render / transcribe / tts commands. CLI is also installed globally (hyperframes on $PATH) so npx is optional. |
gsap | ~/.claude/skills/gsap/SKILL.md | The full GSAP animation vocabulary. The back.out(1.7) springy easing rule in this document is one applied pattern; reach for gsap's full toolkit when the material calls for it. |
website-to-hyperframes | ~/.claude/skills/website-to-hyperframes/SKILL.md | One-shot: scrape a URL → hyperframes composition. Useful when the source has on-screen UI worth re-staging. |
Precedence (when this document conflicts with a companion skill)
- Production correctness (cuts, encoding, ffmpeg flags, ffmpeg filter ordering, transcript handling) → companion skill wins. If
video-usesays "subtitles applied LAST in the filter chain after every overlay" and this document is silent or different, do whatvideo-usesays. Same for hyperframes API conventions,data-*attributes, frame-adapter patterns. - Editorial style (fonts, glass cards, "never cover the face",
_V<N>versioning, archetype layout decisions, MrBeast pacing, no-fabricated-content rule) → this document wins. Companion skills are deliberately style-agnostic; the YUV.AI house style lives here. - Strategy confirmation:
video-useHard Rule 11 says "confirm the plan in plain English before executing." For viral shorts in YUV.AI style this is deliberately suspended — the style is fully captured in this document's Hard Rules, so the agent should execute the first pass and iterate from feedback rather than blocking on creative-direction questions. (For non-viral video-use sessions, follow the companion skill's rule.) - Cut-edge padding:
video-useallows 30–200 ms; this document picks 50 ms head / 80 ms tail as the YUV.AI default within that range. Both are correct; the narrower value is a style choice.
Shared configuration
Both skills read ELEVENLABS_API_KEY from ~/Developer/video-use/.env. Set it once and both skills work. Do not duplicate the key in this skill's directory.
Save location
Default: ~/Documents/yuv-projects/videos/<slug>/ — always save viral video projects here so renders are findable. The <slug> is short, derived from the topic or source filename.
mkdir -p ~/Documents/yuv-projects/videos
cd ~/Documents/yuv-projects/videos
# Initialize the project here.
hyperframes init <slug> --video <source.mp4> --non-interactive
cd <slug>
Final renders land at ~/Documents/yuv-projects/videos/<slug>/renders/final_*_V<N>.mp4 (both 9:16 and 16:9). Tell the user where the videos live at the end of every render.
Style preferences (mandatory defaults — inherited from yuv-design-system)
This skill is the video instantiation of Yuval Avidani's brand system. Whenever a brand or palette is not explicitly specified by the source content, these defaults apply. They override any companion-skill defaults that conflict.
Typography
| Role | English | Hebrew | Rule |
|---|---|---|---|
| Display / cards / karaoke captions | Anton (uppercase, weight 400 + tight tracking -0.02em to -0.04em) | Rubik Black (weight 900) | Always uppercase for English, never title case. Always weight-900 for Hebrew display text. |
| Body / glass-card body copy / chip text | Inter (400 / 500 / 600 / 700) | Assistant (400 / 500 / 600 / 700) | Use only when the card has secondary body copy beneath the display title. Line-height 1.4 inside cards. |
| Numbers / counters / stats | Anton | Rubik Black | Treat numbers as display text, never body. |
| Code / mono (terminal cards, paths) | JetBrains Mono | — | For typewriter-style code reveals. |
Hebrew direction: dir="rtl" on every Hebrew caption container. Use python-bidi's get_display() only in PIL fallback paths; CSS handles RTL natively. Logical CSS properties (margin-inline-*, padding-inline-*) preferred over physical (margin-left).
Banned: serif fonts as defaults, system-ui stack, Anton on Hebrew text (renders as hollow X — Anton is Latin-only).
Color palette
The canonical YUV.AI palette and its video-tuned counterparts:
| Role | Web canonical | Video-rendered (use inside MP4) | Why the deviation |
|---|---|---|---|
| Off-white / paper | #FAFAF7 | #F5F0E1 (warmer) | Reads warmer over dark stages with text-stroke; reduces the "blue-white glare" effect on H264. |
| Yellow / accent | #E5FF00 | #FFE61E | Slightly more saturated; punches through black text-stroke at small sizes. |
| Pink / brand thread | #FF1464 | #FF1464 | Same. Use as glass-card border accents, key-moment strip stroke, brand thread. |
| Black / stage | #0A0A0A | #0c0e16 (warm near-black) | Avoids pure black banding under H264. |
| Charcoal / text on light | #1A1A1A | — | UI overlays only (preview pages, never inside the rendered MP4). |
| Bone / cream | #F5EEE4 | — | Web pages, never inside the MP4. |
Banned in any rendered output:
- Pure
#FFFFFF— always replace with the warmer off-white. - Blue accents (navy, slate, indigo, cool gray) unless the source brand requires them.
- Default Tailwind palette names (
slate-*,zinc-*,gray-*,emerald-*,cyan-*,indigo-*,violet-*,rose-*). - Multicolor icon sets — Phosphor or Lucide single-stroke only.
Layout & motion defaults
- Border radius: glass cards use
40–56px(signature soft-rounded). Pill CTAs999px. Never the 8/12px corporate-rounded middle ground. - Shadows: warm-toned (pink/orange undertone) —
0 20px 60px rgba(255, 20, 100, 0.12)on glass cards, never the default blue-blackrgba(0,0,0,.1). - Whitespace: generous. Cards never fill more than 60% of the frame's free zone.
- Asymmetry over grid-perfect. Offset cards. Overlap. One element breaking the grid.
- Grain: 1–2% noise on any preview / catalog page (not inside the rendered MP4 — would compress to mush).
- Animation library: GSAP (always).
back.out(1.7)for card entrances,power3.outfor SplitText reveals,expo.outfor slam beats,sine.inOutfor idle floats.
Inheritance rule
If you are building any auxiliary surface (a preview page, a catalog, a dashboard, a render-status UI), apply the full yuv-design-system skill defaults — Anton + Inter typography, Fly High purple palette by default, signature components, brand assets. The yuv-design-system is the canonical source of truth for all Yuval-brand visuals.
If you are rendering the MP4 itself, apply the video-tuned values in the table above. Same brand, two delivery surfaces.
The live effects catalog (use this as the menu)
Yuval maintains a public, live, scrollable visual catalog of every effect, transition, marker, card, theme, and SFX this skill can apply to a video:
When commissioning an edit, Yuval (or anyone he sends the catalog to) clicks any effect's chip — the chip copies a stable identifier like captions.karaoke.scale-pop to the clipboard. He pastes those IDs into the prompt:
"edit this with
style.swiss-pulse, captionskaraoke.scale-pop,marker.circleon stats, transitionblur-crossfade, sfxwhooshon every cut, sfximpacton hero word."
You must respect every effect ID the user pastes. The catalog page is the single source of truth — every ID maps to a documented effect with verified behaviour. If the user names an ID you don't recognize, fetch the catalog (https://effects.yuv.ai) and find its definition rather than improvising.
The catalog also contains a prompt library (effects.yuv.ai #prompts) with 8 production-ready full-edit prompts (product launch / explainer / live-code / flow walkthrough / hot-take / tutorial / founder pitch / Hebrew talking-head). When the user references "the prompt for X scenario," route there.
Bilingual rendering — Hebrew + English in the same video
Many edits will mix Hebrew narration with English brand tokens (Claude, GitHub, hyperframes, etc.). The skill must:
-
Per-glyph font routing — Anton handles Latin, Rubik Black handles Hebrew. The single
font-family: 'Anton', 'Rubik', sans-serifdeclaration on every text container does this automatically. Never apply Anton to a Hebrew character (it renders as a hollow X — Anton is Latin-only). -
RTL containers — every Hebrew caption line gets
dir="rtl"on its container. English brand tokens stay LTR within the same line via the browser's bidirectional algorithm — no manual reordering needed. -
Brand-name spelling in Hebrew — Yuval's preferred Hebrew transliterations:
- "Claude" → "קלוד" (not "קלאוד")
- "Hyperframes" → keep English ("Hyperframes")
- "GitHub" → keep English ("GitHub")
- When Scribe transcribes a brand name with the wrong Hebrew spelling, fix it in the post-transcription pass before generating the karaoke layer.
-
Companion catalog has bilingual EN/HE everywhere — the live catalog flips entire UI to Hebrew on toggle (see effects.yuv.ai). The same
data-langtoggle pattern applies to any auxiliary surface this skill produces (preview pages, render-status UIs, etc.).
Lessons baked from production iterations
The following were learned across real sessions and now live in ~/.claude/skills/yuv-design/lessons-learned.md. Read that file before building any auxiliary frontend (preview page, render dashboard, catalog). The video pipeline itself is governed by the Hard Rules below — but any HTML/CSS surface this skill produces must respect:
- Bilingual = toggle, not side-by-side (lesson #1 in
lessons-learned.md) - Mobile = hamburger nav at ≤880px (lesson #2)
- One display headline per section (lesson #3)
document.titleis part of language toggle (lesson #5)- IntersectionObserver for any catalog with >8 demos (lesson #9)
- Build comprehensive on first pass — don't iterate on scope (lesson #10)
Claude-in-Chromefor SPAs,WebFetchonly for static HTML (lessons #7, #15)- Verify content from real source pages, never invent (lesson #8)
When to consult this skill
Any time the user drops a video file and asks for an edit. Even a one-line ask like "take this and edit it:
Three things to internalize before touching anything
1. NEVER invent content. Every word on every card must trace to something the speaker actually said in the transcript. No fake currency placeholders ("$$$"), no fake percentages ("100% understood"), no invented CTAs ("link in bio") if no link was promised. The user will catch fabricated content instantly and trust collapses fast. When in doubt, copy the literal Hebrew quote into the card. Same rule applies to Hebrew transliteration of brand names — for example, when a brand is "Claude", the proper Hebrew transliteration is "קלוד" (not "קלאוד"); use the brand owner's preferred spelling, not whatever Scribe heard.
2. NEVER cover the speaker's face. Position cards in the opposite half of the frame from the face, or use a top/bottom banner that sits above/below the face zone. For a full-frame selfie, the safe zones are y=0–400 (top banner) and y>1500 (bottom). For screen-share footage where the speaker is in a corner PIP, cards live anywhere in the screen-content area.
3. NEVER non-uniformly scale the speaker. Source PIP aspect must equal target PIP aspect. If you scale a 470×460 source crop into a 920×580 box, the speaker comes out horizontally squished and the user will reject the output. Pick target dims that match source aspect, or use force_original_aspect_ratio=decrease + center-pad.
Everything else in this document is a worked example or a tooling detail. The three above are non-negotiable.
The pipeline (high-level)
source.mp4
↓ video-use: transcribe (ElevenLabs Scribe — word-level + diarized + audio events)
transcripts/<source>.json
↓ pick word-snapped beats → edl.json (ranges with start/end, layout per range, motion per beat)
↓ video-use: build base.mp4 per aspect (per-segment extract + grade + 30ms fades + concat)
base_<aspect>_V<N>.mp4 (one per aspect — 9:16 and 16:9)
↓ generate output-timeline transcript.json (word-level, output-timeline timestamps, brand-name spelling fixed)
transcript.json
↓ hyperframes init <project> (creates index.html scaffold + DESIGN.md slot)
↓ hyperframes: build composition.html
: <video> bg + glass cards (CSS backdrop-filter) + karaoke caption layer + GSAP timeline
↓ hyperframes lint (validates the composition)
↓ hyperframes render (Chromium captures every frame, mux to MP4)
final_<aspect>_V<N>.mp4
Hard Rules (the editor's bill of rights)
These are correctness rules. Deviation produces silent failures or angry users.
- Hebrew = Rubik Black. English = Anton uppercase. Never mix. Never substitute. The two fonts are bundled in
assets/fonts/because download URLs change. In hyperframes CSS, declarefont-family: Rubik(weight 900) for Hebrew text andfont-family: Antonfor English; the compiler embeds them automatically. - Real CSS backdrop-filter for glass.
backdrop-filter: blur(40px) saturate(160%)on the card body. Do NOT pre-bake blur into a PNG. Do NOT use semi-transparent fills as a stand-in for blur — they look flat. - 3D-tilt entrance for cards. GSAP entrance:
scale 0.85 → 1.0,rotationY -12deg → 0,y +60 → 0withease: "back.out(1.7)"andtransformPerspective: 900. No flat fade-ins. - Word-by-word karaoke synced to the transcript. Active word:
scale 1.0+ accent color. Past/future words:scale 1.0at 55% opacity. Per-wordgsap.fromTo(scale 1.5 → 1.0, ease "back.out(2)", duration 0.16)on activation. - Subtitles applied LAST. The karaoke layer has the highest z-index. No card should overlap it.
- Per-segment extract →
-c copyconcat. The video-use skill enforces this; mirror it. Single-pass filtergraphs cause double-encode of every segment. - 30 ms audio fades at every segment boundary.
afade=t=in:st=0:d=0.03,afade=t=out:st={dur-0.03}:d=0.03. Without this, audible pops at every cut. - Word-boundary snapping with 50 ms head / 80 ms tail padding. Cut edges come from
word.start/word.endin the Scribe transcript. - Cache transcripts. Never re-transcribe the same source. Scribe costs money.
- All Hebrew text in any rendering pass must be RTL-correct. In CSS, set
dir="rtl"on the container; libass handles RTL natively. In any PIL fallback path, usepython-bidi'sget_display(). - Even pixel dimensions. ffmpeg's
yuv420pchroma subsampling requires even W and H. Odd dims silently round, which then breaksalphamerge("Input frame sizes do not match"). - Aspect preservation on the speaker PIP. Match source crop aspect to target PIP aspect.
- NEVER make up content. Audit every string against the transcript before render.
- Save every render with
_V<N>.mp4suffix. Each iteration writes a NEW versioned file. Never overwrite a previous final — the user keeps backups by version. - All session outputs in
<videos_dir>/edit/. Never write inside the skill directory.
Layout decisions (the most important call you make)
The first decision after transcribing is what the source footage actually is. There are two archetypes:
Archetype 1 — "Screen-share with speaker PIP corner"
Source: 16:9 with a screen recording (browser, app, dashboard) filling most of the frame, and the speaker's webcam in a small corner PIP. The screen content IS the visual story. Hide it and the user gets angry.
| Output | Top zone (screen content) | Bottom / corner zone (speaker PIP) | Card real estate |
|---|---|---|---|
| 9:16 | Top 2/3 — crop center column of source, scale to 1080×1280 | Bottom 1/3 — crop the source PIP region (e.g. 470×460), scale to a target with matching aspect (e.g. 600×588), apply rounded mask 50px+ | Hyperframes glass cards float in top 2/3 over the screen content; key-moment strip floats just above PIP |
| 16:9 | Full frame — scale source 2560×1440 to 1920×1080 | The PIP is naturally in source bottom-right; overlay a rounded-corner mask on that exact region | Cards on left half, key-strip across bottom |
Archetype 2 — "Full-frame selfie"
Source: vertical or horizontal with the speaker filling the frame. There's no "screen" beyond the speaker themselves. The speaker IS the content — keep them prominent.
| Output | Speaker placement (per-beat) | Card real estate |
|---|---|---|
| 9:16 | Layout A (emotional beats — HOOK / PROBLEM / WOW / CTA): full-frame center-vertical crop. Layout B (explanatory beats): speaker shrunk to a top-right rounded PIP (e.g. 480×640), dark gradient BG fills rest. Source crop for the PIP must include the speaker's full head — start y at ~80px in the source, NOT 200px+ (that cuts the forehead). | Layout B: glass cards fill the bottom 2/3 |
| 16:9 | Letterbox: scale source vertically to fit height, blurred (gblur sigma=22) zoomed copy fills the side bars, crisp speaker centered on top | Cards on left half over the blurred BG |
If the source is HYBRID (some beats are full speaker, others screen-share), pick the archetype per beat. Set "layout": "A" or "layout": "B" per range in edl.json.
Apple-style liquid-glass cards (the hyperframes layer)
The card style is non-negotiable. Build it in HTML/CSS, not PIL.
.card {
position: absolute;
border-radius: 56px;
background: rgba(255, 255, 255, 0.08);
backdrop-filter: blur(40px) saturate(160%);
-webkit-backdrop-filter: blur(40px) saturate(160%);
border: 1.5px solid rgba(255, 255, 255, 0.18);
box-shadow:
0 20px 60px rgba(0, 0, 0, 0.5),
inset 0 2px 0 rgba(255, 255, 255, 0.18);
padding: 60px 50px;
display: flex;
flex-direction: column;
justify-content: center;
align-items: center;
gap: 24px;
transform-origin: center;
will-change: transform, opacity;
}
.card.dark {
background: rgba(0, 0, 0, 0.55);
}
// Entrance — 3D tilt-pop with overshoot
gsap.set(".card", { opacity: 0, scale: 0.85, y: 60, rotationY: -12, transformPerspective: 900 });
tl.to(sel, { opacity: 1, scale: 1.0, y: 0, rotationY: 0, duration: 0.55, ease: "back.out(1.7)" }, beat.s + 0.05);
// Subtle idle float — keeps the card alive
tl.to(sel, { y: -8, duration: 0.9, ease: "sine.inOut", yoyo: true, repeat: 1 }, beat.s + 0.6);
// Exit
tl.to(sel, { opacity: 0, scale: 0.95, y: -30, duration: 0.30, ease: "power2.in" }, beat.e - 0.30);
tl.set(sel, { visibility: "hidden" }, beat.e);
Inside the card, stagger the children so the card "fills" itself with content rather than appearing all at once:
tl.from(`${sel} .chip`, { y: 30, opacity: 0, scale: 0.9,
duration: 0.4, ease: "back.out(1.6)", stagger: 0.10 }, beat.s + 0.40);
Karaoke (video-title style)
Group transcript words into chunks of 3 (or break on >0.5s pauses). Each chunk becomes one .cap-group div. Within the chunk, each word is a <span class="cap-word"> (or .cap-word.en for English brand tokens, .cap-word.accent for emphasis Hebrew).
CSS:
.cap-word {
font: 900 108px Rubik;
color: rgba(245, 240, 225, 0.55); /* dim future/past */
text-shadow: 0 8px 18px rgba(0,0,0,.85), 0 0 28px rgba(0,0,0,.7);
-webkit-text-stroke: 4px #0c0e16;
transform-origin: center bottom;
}
.cap-word.en { font: 400 124px Anton; letter-spacing: 0.06em; color: rgba(255, 230, 30, 0.6); }
.cap-word.active { color: #F5F0E1; }
.cap-word.en.active { color: #FFE61E; }
.cap-word.accent.active { color: #FFE61E; text-shadow: 0 0 20px rgba(255,230,30,.6); }
GSAP per-word:
tl.fromTo(wordSel,
{ scale: 1.5, opacity: 0 },
{ scale: 1.0, opacity: 1, duration: 0.16, ease: "back.out(2)" },
word.start);
tl.call(() => document.querySelector(wordSel).classList.add("active"), [], word.start + 0.02);
Group entrance/exit per cap-group:
tl.fromTo(groupSel,
{ opacity: 0, visibility: "visible", scale: 0.85, y: 30 },
{ opacity: 1, scale: 1, y: 0, duration: 0.25, ease: "back.out(1.6)" },
group.start);
tl.to(groupSel, { opacity: 0, scale: 0.95, y: -20, duration: 0.2, ease: "power2.in" }, group.end - 0.20);
tl.set(groupSel, { visibility: "hidden" }, group.end);
English brand tokens stay English. The detection regex ^[A-Za-z][A-Za-z0-9\-']*$ routes English/brand words to the EN style (Anton, uppercase). Hebrew flows through libass+FriBidi for RTL or via dir="rtl" in CSS.
Per-segment dynamic motion
Every segment baked-in motion. Subtle MrBeast-style amounts.
| Kind | Effect | Typical amount |
|---|---|---|
push_in | Slow zoom in | 0.04–0.07 |
pull_out | Slow zoom out (revealing) | 0.05–0.07 |
snap_in | Cubic ease-out punch zoom | 0.05–0.10 |
dolly_l / dolly_r | Held zoom + horizontal drift | 0.04 |
hold | No motion (use sparingly) | – |
Keep amounts subtle. Anything ≥0.10 outside of WOW-class beats reads as "too much zoom".
Bottom key-moment strips
A small glass strip across the bottom summarising the current beat in 1–4 words. Hebrew on the right (RTL), English on the left (Anton uppercase), yellow accent stripe. Appears only on key beats — not every beat.
.km-strip {
position: absolute;
left: 40px; right: 40px;
bottom: 30px;
height: 110px;
border-radius: 30px;
background: rgba(0, 0, 0, 0.55);
backdrop-filter: blur(30px);
border: 1.5px solid rgba(255, 230, 30, 0.55);
display: flex;
align-items: center;
justify-content: space-between;
padding: 0 40px;
z-index: 70;
}
SFX library
Generated via ElevenLabs sound-generation API. Standard kit:
impact.mp3— HOOK punch + WOW slambass_drop.mp3— layered with impact for the boomwhoosh.mp3— every cut transitionriser.mp3— building into a punchlineding.mp3— UI pop on card reveals + per-bar dings on chart entranceglitch.mp3— short transition glitchtyping.mp3— under any terminal/coding card
Mix in the video-use base layer (not in hyperframes) so the audio is committed before the visual composition reads it.
Versioning
Every render writes a NEW file. Find the highest existing final_<aspect>_V<N>.mp4 and increment. The composite scripts split BASE_VERSION (read from) and OUT_VERSION (write to) so the base can be reused across multiple final iterations.
Common failure modes (and the fixes)
- "Input frame sizes do not match (X vs Y)" → odd target dimension. Round to even pixels for both W and H.
- Hebrew letters render reversed → in CSS, set
dir="rtl"on the container. In PIL fallbacks, wrap withbidi.get_display(). - Hebrew avatar letter renders as hollow X → using Anton font for a Hebrew character. Anton is Latin-only. Switch to Rubik for any text containing Hebrew.
Filter 'split' has output 0 unconnected→ declared a split but didn't consume one of its outputs. Check labels match.- Speaker looks horizontally squished → non-uniform scale on the PIP source crop. Match target aspect to source aspect.
- Speaker forehead/hair cut off in PIP → source crop
ystarted too low. For a 2560×1440 source where the speaker fills the frame, starty ≈ 80, not 200+. subtitles=filter fails on Windows → unescaped colon in the path. Escape with\:.-shortestignored →-loop 1on a PNG creates an infinite stream. Add-t <dur>to constrain the looped input.-tdoesn't limit source duration →-tis an input option and must come BEFORE the-iit applies to.- Hyperframes "sparse keyframes" warning → re-encode the base with
-g 30 -keyint_min 30for smooth seek.ffmpeg -i in.mp4 -c:v libx264 -r 30 -g 30 -keyint_min 30 -movflags +faststart -c:a copy out.mp4.
File layout (per project)
<videos_dir>/edit/
├── transcripts/<source>.json (cached Scribe output)
├── edl.json (cut decisions, layout per range, motion per beat)
├── transcript.json (output-timeline word-level, generated for hyperframes)
├── base_<aspect>_V<N>.mp4 (video-use cuts + base layout, audio mixed)
├── hf/ (hyperframes project)
│ ├── DESIGN.md (visual identity — colors, fonts, motion rules)
│ ├── index.html (the composition)
│ ├── base_9x16.mp4 / base_16x9.mp4 (copied from parent edit/)
│ ├── transcript.json (copied from parent edit/)
│ └── renders/ (hyperframes output)
└── final_<aspect>_V<N>.mp4 (copied from hf/renders/, with version bumped)
Setup
See references/setup.md for: ffmpeg, ElevenLabs API key, hyperframes CLI (npx hyperframes), python-bidi, fonts. The video-use skill is a hard prerequisite — install that first.
Scripts (this skill)
These scripts handle the video-use side (cuts + base extraction). The hyperframes side is authored as HTML/CSS/JS per project — no fixed scripts there because every project has its own beats and content.
scripts/apple_glass.py—apple_glass_card(),make_glass_mask(),make_dark_bg(),rtl(),COLORS, layout constants. Used as a PIL fallback only — prefer hyperframes for production cards.scripts/build_screen_share.py— base build for Archetype 1 (screen-share + corner PIP)scripts/build_selfie.py— base build for Archetype 2 (full-frame speaker)scripts/make_karaoke.py— output-timeline ASS karaoke (used as a PIL fallback or for ffmpeg-direct paths)scripts/make_sfx.py— ElevenLabs SFX batch generatorscripts/composite_screen_share.py/scripts/composite_selfie.py— full ffmpeg filter_complex paths (legacy, prefer hyperframes)scripts/make_layout_assets.py— pre-render dark-mode BG + speaker masks
The unbreakable promise
If the user drops a video and asks for an edit, the output:
- Shows the screen content if it's a screen-share (NEVER hide it under dark BG).
- Shows the speaker prominently if it's a selfie (NEVER shrink to tiny corner).
- Has every card text traceable to a real spoken word in the transcript.
- Renders both 9:16 AND 16:9.
- Saves with
_V<N>so the previous version is preserved. - Uses Rubik for Hebrew, Anton uppercase for English. Always.
- Never covers the speaker's face.
- Uses hyperframes for the visual composition (real CSS backdrop-filter, GSAP springy easing, Hollywood-grade animation).
Get those right and you've delivered. Anything else is polish.