API Reference

April 22, 2026 · View on GitHub

Complete documentation for all MCP tools exposed by the sanzaru server.

Video Generation Tools

`create_video`

Generate videos using OpenAI's Sora API.

Parameters:

prompt (string, required): Text description of the video to generate
model (string, optional): Model to use - "sora-2" (default) or "sora-2-pro"
seconds (string, optional): Duration as string - "4", "8", or "12" (NOTE: Must be string, not integer)
size (string, optional): Resolution - "720x1280", "1280x720", "1024x1792", or "1792x1024"
input_reference_filename (string, optional): Filename (not path) of reference image from IMAGE_PATH

Returns: Video object with id, status, progress, model, seconds, size

Example:

video = create_video(
    prompt="A serene mountain landscape at sunrise",
    model="sora-2",
    seconds="8",
    size="1280x720"
)

`get_video_status`

Check the status of a video generation job.

Parameters:

video_id (string, required): ID returned from create_video

Returns: Video object with updated status and progress

Status values:

"queued": Job is queued
"in_progress": Currently generating (check progress field for 0-100%)
"completed": Ready to download
"failed": Generation failed

Example:

status = get_video_status(video.id)
# Poll until status.status == "completed"

`download_video`

Download a completed video to VIDEO_PATH.

Parameters:

video_id (string, required): ID of completed video
filename (string, optional): Custom filename (defaults to {video_id}.{extension})
variant (string, optional): What to download - "video" (default), "thumbnail", or "spritesheet"

Variant formats:

"video" → MP4 file
"thumbnail" → WEBP image
"spritesheet" → JPG image

Returns: DownloadResult with filename, variant

Example:

result = download_video(video.id, filename="my_video.mp4")
# File saved to: {VIDEO_PATH}/my_video.mp4

List all video generation jobs with pagination.

Parameters:

limit (integer, optional): Max results to return (default: 20, max: 100)
after (string, optional): Cursor for pagination (use last from previous response)
order (string, optional): Sort order - "desc" (default, newest first) or "asc"

Returns: Object with data (array of video summaries), has_more (boolean), last (cursor)

Example:

page1 = list_videos(limit=20)
if page1.has_more:
    page2 = list_videos(limit=20, after=page1.last)

`delete_video`

Permanently delete a video from OpenAI's storage.

Parameters:

video_id (string, required): ID of video to delete

Returns: Confirmation with deleted video ID

Warning: This is permanent and cannot be undone!

`remix_video`

Create a new video by remixing an existing completed video.

Parameters:

previous_video_id (string, required): ID of completed video to remix
prompt (string, required): New prompt to guide the remix

Returns: NEW Video object with different video_id

Note: This creates a brand new job. Poll the NEW video_id for completion.

`list_local_videos`

List locally downloaded video files in VIDEO_PATH.

Parameters:

pattern (string, optional): Glob pattern to filter filenames (e.g., "*.mp4", "sora*")
file_type (string, optional): Filter by type - "mp4", "webm", "mov", or "all" (default)
sort_by (string, optional): Sort by "name", "size", or "modified" (default)
order (string, optional): "desc" (default) or "asc"
limit (integer, optional): Max results (default: 50)

Returns: Object with data (array of VideoFile objects with filename, size_bytes, modified_timestamp, file_type)

Example:

# List all local videos
videos = list_local_videos()

# Find MP4 files matching a pattern
videos = list_local_videos(pattern="sora*", file_type="mp4")

# Get recently modified
recent = list_local_videos(sort_by="modified", order="desc", limit=10)

Image Generation Tools

Two APIs are available for image generation:

Tool	API	Best For
`generate_image`	Images API	New generation with gpt-image-2 (RECOMMENDED)
`edit_image`	Images API	Editing existing images
`create_image`	Responses API	Iterative refinement with `previous_response_id`

Images API (gpt-image-2 default): Synchronous, returns immediately, no polling required, up to 4K output Responses API (GPT-5.2): Async polling pattern, supports iterative refinement chains + action field, gpt-image-2 via tool_config

`generate_image`

Generate images using OpenAI's Images API with gpt-image-2 (default). RECOMMENDED for new image generation.

Key advantages:

Synchronous - returns immediately (no polling)
gpt-image-2 - state-of-the-art quality, ~99% text accuracy, up to 4K output
Token usage tracking for cost monitoring
Accepts thousands of valid resolutions (not just the documented presets)

Parameters:

prompt (string, required): Text description of the image (max 32k chars)
model (string, optional): Model - "gpt-image-2" (default, recommended), "gpt-image-1.5", "gpt-image-1", "gpt-image-1-mini", "dall-e-3", "dall-e-2"
size (string, optional): Dimensions - "auto" (default), "1024x1024", "1536x1024", "1024x1536", plus gpt-image-2 sizes "2048x2048", "2048x1152", "3840x2160", "2160x3840"
quality (string, optional): Quality - "auto" (default), "low", "medium", "high"
background (string, optional): Background - "auto" (default), "transparent" (NOT supported on gpt-image-2 — use gpt-image-1.5), "opaque"
output_format (string, optional): Format - "png" (default), "jpeg", "webp"
moderation (string, optional): Content moderation - "auto" (default), "low"
filename (string, optional): Custom output filename (auto-generated if omitted)

Returns: ImageGenerateResult with filename, size, format, model, usage

Usage tracking: Returns token counts for cost monitoring:

result.usage.input_tokens   # Text tokens
result.usage.output_tokens  # Image tokens
result.usage.total_tokens   # Combined total

Examples:

# Basic generation (recommended path)
result = generate_image(prompt="a sunset over mountains")
# File immediately available at result.path

# High quality portrait
result = generate_image(
    prompt="professional headshot, studio lighting",
    size="1024x1536",
    quality="high"
)

# Transparent background for icons (falls back to gpt-image-1.5)
result = generate_image(
    prompt="product icon, clean design",
    model="gpt-image-1.5",
    background="transparent",
    output_format="png"
)

# Fast generation with mini model
result = generate_image(
    prompt="quick sketch of a cat",
    model="gpt-image-1-mini"
)

`edit_image`

Edit existing images using OpenAI's Images API with gpt-image-2 (default).

Key features:

Synchronous - returns immediately (no polling)
Supports up to 16 input images for composition
Mask-based inpainting
Multi-image composition and blending

Parameters:

prompt (string, required): Description of desired edits (max 32k chars)
input_images (array, required): List of image filenames from IMAGE_PATH (1-16 images)
model (string, optional): Model - "gpt-image-2" (default), "gpt-image-1.5", "gpt-image-1", "gpt-image-1-mini"
mask_filename (string, optional): PNG mask with alpha channel for inpainting (transparent = edit, opaque = keep)
size (string, optional): Output dimensions - "auto" (default), "1024x1024", "1536x1024", "1024x1536", plus gpt-image-2 sizes "2048x2048", "2048x1152", "3840x2160", "2160x3840"
quality (string, optional): Quality - "auto" (default), "low", "medium", "high"
background (string, optional): Background - "auto" (default), "transparent" (NOT supported on gpt-image-2), "opaque"
output_format (string, optional): Format - "png" (default), "jpeg", "webp"
input_fidelity (string, optional): Fidelity to input - "high" (preserve faces/style) or "low" (more creative freedom). gpt-image-1 / gpt-image-1.5 only — silently ignored for gpt-image-2 (always high).
filename (string, optional): Custom output filename

Returns: ImageGenerateResult with filename, size, format, model, usage

Examples:

# Simple edit
result = edit_image(
    prompt="add a hat to the person",
    input_images=["portrait.png"]
)

# Multi-image composition
result = edit_image(
    prompt="create a gift basket containing all these items",
    input_images=["lotion.png", "soap.png", "candle.png"]
)

# Inpainting with mask
result = edit_image(
    prompt="add a flamingo standing in the water",
    input_images=["pool.png"],
    mask_filename="pool_mask.png"
)

# High-fidelity face preservation on gpt-image-1.5
result = edit_image(
    prompt="change hair color to red",
    input_images=["portrait.jpg"],
    model="gpt-image-1.5",
    input_fidelity="high",
)

`create_image`

Generate images using OpenAI's Responses API. Use for iterative refinement with previous_response_id.

Tip: Use tool_config={"type": "image_generation", "model": "gpt-image-2"} for best quality. Use "gpt-image-1.5" when you need transparent backgrounds. You can also pass action: "generate" / "edit" to force a mode when an image is in context (default "auto").

Parameters:

prompt (string, required): Text description of image to generate
model (string, optional): Model to use - "gpt-5.2" (default, OpenAI's latest), "gpt-5.1", "gpt-5", "gpt-4.1"
tool_config (object, optional): Advanced configuration (ImageGeneration type)
previous_response_id (string, optional): Previous response ID for iterative refinement
input_images (array, optional): Array of filenames from IMAGE_PATH for image editing
mask_filename (string, optional): PNG mask file for inpainting

Returns: ImageResponse with id, status, created_at

Example:

# Generate from text
resp = create_image(prompt="sunset over mountains")

# Iterative refinement
resp2 = create_image(
    prompt="add more dramatic clouds",
    previous_response_id=resp.id
)

# Image editing
resp3 = create_image(
    prompt="add a flamingo to the pool",
    input_images=["pool.png"]
)

`get_image_status`

Check status of image generation job.

Parameters:

response_id (string, required): ID returned from create_image

Returns: ImageResponse with updated status

`download_image`

Download completed image to IMAGE_PATH.

Parameters:

response_id (string, required): ID of completed image
filename (string, optional): Custom filename (auto-generated if omitted)

Returns: ImageDownloadResult with filename, size, format

Reference Image Management Tools

`list_reference_images`

Search and list available reference images in IMAGE_PATH.

Parameters:

pattern (string, optional): Glob pattern to filter filenames (e.g., "cat*.png", "*.jpg")
file_type (string, optional): Filter by type - "jpeg", "png", "webp", or "all" (default)
sort_by (string, optional): Sort by "name", "size", or "modified" (default)
order (string, optional): "desc" (default) or "asc"
limit (integer, optional): Max results (default: 50)

Returns: Array of ReferenceImage objects with filename, size_bytes, modified_timestamp, file_type

Example:

# Find all dog images
images = list_reference_images(pattern="dog*", file_type="png")

# Get recently modified
recent = list_reference_images(sort_by="modified", order="desc", limit=10)

`prepare_reference_image`

Resize images to match Sora's required dimensions.

Parameters:

input_filename (string, required): Source image filename in IMAGE_PATH
target_size (string, required): Target size - "720x1280", "1280x720", "1024x1792", or "1792x1024"
output_filename (string, optional): Custom output name (defaults to {original}_{width}x{height}.png)
resize_mode (string, optional): How to handle aspect ratio - "crop" (default), "pad", or "rescale"

Resize modes:

crop: Scale to cover target, center crop excess (no distortion, may lose edges)
pad: Scale to fit inside target, add black bars (no distortion, preserves full image)
rescale: Stretch/squash to exact dimensions (may distort, no cropping/padding)

Returns: PrepareResult with output_filename, original_size, target_size, resize_mode

Example:

result = prepare_reference_image(
    "photo.jpg",
    "1280x720",
    resize_mode="crop"
)
# Creates: photo_1280x720.png

Audio Tools

For detailed audio tool documentation, see docs/audio/README.md.

Available tools:

list_audio_files - List and filter audio files
get_latest_audio - Get most recent audio file
convert_audio - Convert to mp3/wav
compress_audio - Compress for API limits
transcribe_audio - Whisper transcription
chat_with_audio - GPT-4o audio analysis
transcribe_with_enhancement - Enhanced transcription
create_audio - Text-to-speech generation

Best Practices

Polling for Completion

Don't block - poll status periodically:

# ❌ Don't block
video = create_video(...)
while get_video_status(video.id).status != "completed":
    # blocks LLM session

# ✅ Do poll with messaging
video = create_video(...)
status = get_video_status(video.id)
if status.status != "completed":
    return f"Video generating... {status.progress}% complete. Check back in a moment."

File Security

All file operations are sandboxed to configured paths
Reference images must be in IMAGE_PATH (no path traversal)
Symlinks are rejected for security
Downloaded content goes to VIDEO_PATH or IMAGE_PATH

Error Handling

All tools return structured error messages. Common errors:

File not found in reference path
Invalid dimensions for target size
Video not completed yet
API rate limits