Workflows
June 26, 2026 · View on GitHub
Workflows are high-level pipelines that compose skills into end-to-end processing chains. They live in agent/extensions/workflows/.
Overview
| Workflow | File | Input | Output |
|---|---|---|---|
| Brief | brief.py | VideoAsset | Metadata, frames, ASR, timeline |
| Detailed | detailed.py | VideoAsset | All of brief + OCR, emotions, objects, translation |
| Index | index.py | VideoAsset (needs analysis) | FAISS index + chunk metadata |
| Ask | ask.py | VideoAsset + question (needs index) | Answer with evidence |
| Highlights | highlights.py | VideoAsset (needs analysis) | Clips + optional reel |
| Report | report.py | VideoAsset | Structured analysis report |
| Analyze | analyze.py | VideoAsset + mode | Routes to the requested workflow mode |
Brief Workflow
Fast ASR-first analysis for quick video insights.
Steps:
- Probe video metadata (duration, resolution, fps)
- Parse embedded subtitles when available
- Extract audio and run Whisper ASR if subtitles are missing
- Assess transcript sufficiency using coverage and word-count thresholds
- Skip visual captioning when transcript coverage is sufficient, unless forced
- Sample and caption frames when visual processing is needed
- Build structured timeline (chapters + events) via LLM
- Optionally enhance with web search
Parameters:
max_frames— default 64whisper_model— default "small" (setNoneto skip ASR)force_visual— override ASR-first skipping and always run visual captioninginclude_web_search— default Falsedirect_model/model_path— for local model loading
Detailed Workflow
Comprehensive analysis with all available skills.
Adds to brief:
- OCR text extraction from key frames (PaddleOCR)
- Object detection (YOLOv8) on frames
- Emotion analysis — audio (Wav2Vec2) + visual (FER)
- ASR translation to target language (default: Chinese)
- Lower scene detection threshold (0.25) and more frames (128)
- Optional long-video segment parallelism and parallel ASR
Parameters:
max_frames— default 128- All brief parameters plus advanced skill toggles
Index Workflow
Builds a FAISS semantic index for retrieval-augmented Q&A.
Steps:
- Load existing analysis (or auto-run detailed if missing)
- Chunk video content by time windows (default: 20s)
- Generate dense embeddings via OpenAI-compatible API
- Build FAISS index with L2-normalized vectors
Parameters:
chunk_sec— default 20embed_base_url/embed_model— embedding API endpoint
Output: Index files in cache/{vid}/index_faiss/, item count, chunk metadata.
Ask Workflow
Answers natural-language questions about video content using semantic search.
Steps:
- Embed the question using the same embedding model
- Search FAISS index for top-k relevant chunks
- Synthesize an answer using LLM with retrieved context
Parameters:
question— the question to answertop_k— default 5
Output:
{
"result": {
"answer": "...",
"evidence": [{"start": 10.0, "end": 30.0, "frame_ids": [...], ...}]
},
"hits": [...]
}
Highlights Workflow
Detects high-impact segments and exports video clips.
Steps:
- Load analysis (or auto-run detailed if missing)
- Detect highlights based on information density (LLM scoring)
- Export individual clips via FFmpeg
- Optionally concatenate into a highlight reel
Parameters:
max_clips— default 5also_make_reel— default True
Output: Clip paths, reel path, timeline mapping.
Report Workflow
Generates a comprehensive analysis report combining all sources.
Steps:
- Load or run analysis
- Extract key information: metadata, timeline, top frames, transcript
- Perform web search if enabled
- Generate intelligent recommendations via LLM
Output: Structured report JSON with sections for metadata, timeline summary, key frames, transcript highlights, web search insights, and recommendations.
Dependency Chain
brief / detailed (standalone — no prerequisites)
│
▼
index (requires analysis.json)
│
▼
ask (requires FAISS index)
highlights (requires analysis.json)
report (requires analysis.json or runs brief internally)
Missing prerequisites are auto-generated when possible (e.g., index will run detailed if no analysis.json exists).