Workflows

June 26, 2026 · View on GitHub

Workflows are high-level pipelines that compose skills into end-to-end processing chains. They live in agent/extensions/workflows/.

Overview

Workflow	File	Input	Output
Brief	`brief.py`	VideoAsset	Metadata, frames, ASR, timeline
Detailed	`detailed.py`	VideoAsset	All of brief + OCR, emotions, objects, translation
Index	`index.py`	VideoAsset (needs analysis)	FAISS index + chunk metadata
Ask	`ask.py`	VideoAsset + question (needs index)	Answer with evidence
Highlights	`highlights.py`	VideoAsset (needs analysis)	Clips + optional reel
Report	`report.py`	VideoAsset	Structured analysis report
Analyze	`analyze.py`	VideoAsset + mode	Routes to the requested workflow mode

Brief Workflow

Fast ASR-first analysis for quick video insights.

Steps:

Probe video metadata (duration, resolution, fps)
Parse embedded subtitles when available
Extract audio and run Whisper ASR if subtitles are missing
Assess transcript sufficiency using coverage and word-count thresholds
Skip visual captioning when transcript coverage is sufficient, unless forced
Sample and caption frames when visual processing is needed
Build structured timeline (chapters + events) via LLM
Optionally enhance with web search

Parameters:

max_frames — default 64
whisper_model — default "small" (set None to skip ASR)
force_visual — override ASR-first skipping and always run visual captioning
include_web_search — default False
direct_model / model_path — for local model loading

Detailed Workflow

Comprehensive analysis with all available skills.

Adds to brief:

OCR text extraction from key frames (PaddleOCR)
Object detection (YOLOv8) on frames
Emotion analysis — audio (Wav2Vec2) + visual (FER)
ASR translation to target language (default: Chinese)
Lower scene detection threshold (0.25) and more frames (128)
Optional long-video segment parallelism and parallel ASR

Parameters:

max_frames — default 128
All brief parameters plus advanced skill toggles

Index Workflow

Builds a FAISS semantic index for retrieval-augmented Q&A.

Steps:

Load existing analysis (or auto-run detailed if missing)
Chunk video content by time windows (default: 20s)
Generate dense embeddings via OpenAI-compatible API
Build FAISS index with L2-normalized vectors

Parameters:

chunk_sec — default 20
embed_base_url / embed_model — embedding API endpoint

Output: Index files in cache/{vid}/index_faiss/, item count, chunk metadata.

Ask Workflow

Answers natural-language questions about video content using semantic search.

Steps:

Embed the question using the same embedding model
Search FAISS index for top-k relevant chunks
Synthesize an answer using LLM with retrieved context

Parameters:

question — the question to answer
top_k — default 5

Output:

{
  "result": {
    "answer": "...",
    "evidence": [{"start": 10.0, "end": 30.0, "frame_ids": [...], ...}]
  },
  "hits": [...]
}

Highlights Workflow

Detects high-impact segments and exports video clips.

Steps:

Load analysis (or auto-run detailed if missing)
Detect highlights based on information density (LLM scoring)
Export individual clips via FFmpeg
Optionally concatenate into a highlight reel

Parameters:

max_clips — default 5
also_make_reel — default True

Output: Clip paths, reel path, timeline mapping.

Report Workflow

Generates a comprehensive analysis report combining all sources.

Steps:

Load or run analysis
Extract key information: metadata, timeline, top frames, transcript
Perform web search if enabled
Generate intelligent recommendations via LLM

Output: Structured report JSON with sections for metadata, timeline summary, key frames, transcript highlights, web search insights, and recommendations.

Dependency Chain

brief / detailed   (standalone — no prerequisites)
       │
       ▼
     index         (requires analysis.json)
       │
       ▼
      ask          (requires FAISS index)

highlights         (requires analysis.json)
report             (requires analysis.json or runs brief internally)

Missing prerequisites are auto-generated when possible (e.g., index will run detailed if no analysis.json exists).