π Generative Media Skills for AI Agents
June 2, 2026 Β· View on GitHub
The Ultimate Multimodal Toolset for Claude Code, Cursor, and Gemini CLI. A high-performance, schema-driven architecture for AI agents to generate, edit, and display professional-grade images, videos, and audio β powered by the muapi-cli.
π Get Started | π¬ Recipe Pack | π¨ Expert Library | βοΈ Core Primitives | π€ MCP Server | π Reference
Related Projects
- Open-Generative-AI β Free self-hosted AI media studio β GUI alternative to these skills for the same model set
- Awesome-GPT-Image-2-API-Prompts β Curated GPT-Image-2 prompts to use with these skills
- Awesome-Gemini-Omni-API-Prompts β Curated Gemini Omni prompts for video generation
β¨ Key Features
- π€ Agent-Native Design β CLI-powered scripts with structured JSON outputs, semantic exit codes, and
--jqfiltering for seamless agentic pipelines. - π§ Expert Knowledge Layer β Domain-specific skills that bake in professional cinematography, atomic design, and branding logic.
- β‘ CLI-Powered Core β All primitives delegate to
muapi-cliβ no curl, no JSON parsing, no boilerplate. - πΌοΈ Direct Media Display β Use the
--viewflag to automatically download and open generated media in your system viewer. - π Local File Support β Auto-upload images, videos, faces, and audio from your local machine to the CDN for processing.
- π 100+ AI Models β One-click access to Midjourney v7, Flux Kontext, Seedance 2.0, Kling 3.0, Veo3, and more.
- π MCP Server β Run
muapi mcp serveto expose all 19 tools directly to Claude Desktop, Cursor, or any MCP-compatible agent.
ποΈ Scalable Architecture
This repository uses a Core/Library split to ensure efficiency and high-signal discovery for LLMs:
βοΈ Core Primitives (/core)
Thin wrappers around muapi-cli for raw API access.
core/media/β File uploadcore/edit/β Image editing (prompt-based)core/platform/β Setup, auth & result polling
π Expert Library (/library)
High-value skills that translate creative intent into technical directives.
- Cinema Director (
/library/motion/cinema-director/) β Technical film direction & cinematography. - Nano-Banana (
/library/visual/nano-banana/) β Reasoning-driven image generation (Gemini 3 Style). - UI Designer (
/library/visual/ui-design/) β High-fidelity mobile/web mockups (Atomic Design). - Logo Creator (
/library/visual/logo-creator/) β Minimalist vector branding (Geometric Primitives). - Seedance 2 (Doubao Video) (
/library/motion/seedance-2/) β Director-level cinematic video generation with text-to-video, image-to-video, and video extension with native audio-video sync. - AI Clipping (
/library/edit/ai-clipping/) β Long video β ranked vertical short clips in one managed API call. Server-side transcription, virality ranking, dedupe, and face-tracked auto-crop β no local Whisper or LLM. - YouTube Shorts (
/library/social/youtube-shorts/) β Platform-aware preset over AI Clipping (Shorts / TikTok / Reels / Feed defaults).
Plus 41 ready-to-run workflow recipes organized by output type β see π¬ Recipe Pack below.
π¬ Recipe Pack
Forty-one LLM-orchestrated workflow recipes that combine multiple muapi-cli calls into named end-to-end pipelines (e.g. photo of person β 3D action figure, product photo β cinematic 10s ad). Each skill is a SKILL.md the agent reads and follows; bring your own consuming agent (Claude Code, Cursor, MCP) β these are recipes, not bash wrappers.
Motion / Video (16)
| Skill | Description |
|---|---|
| 3D Logo Animation | Transform a 2D logo into a premium 3D version and animate it with professional cinematic effects |
| AI Fight Scene Generator | High-cut-density action / fight scene β 16-cell storyboard image drives Seedance 2.0 i2v for shot-by-shot choreography |
| Animal Vlogger Video | Hilarious, ultra-realistic anthropomorphic-animal vlogger acting like a human in a real-world setting |
| Cartoon Dance Animation | Convert a photo into a Pixar-style 3D cartoon, then animate using a reference dance/motion video |
| Character Story Video | Multi-part animated story video β establish a consistent character then animate sequential scenes |
| Drone-Style Video | Aerial drone-perspective footage β bird's-eye sweeps, orbit shots, and flyover sequences |
| Giant Product Showcase | Dramatic giant-scale product visual (building-sized object next to a person), optionally animated |
| Jewelry Product Video | Luxury jewelry ad with high-end commercial cinematography and detailed macro animation |
| Music Video | Short music video from a song theme β keyframes, animation per beat, matching music track |
| One-Shot Video | Single continuous cinematic shot β no cuts, one seamless flowing scene |
| Cinematic Product Ad | Cinematic 5β10s product ad from a product photo + brand brief |
| Product Showcase Video | Dynamic product showcase with explosive ingredient arrangement + realistic motion animation |
| Product Video Ad Maker | High-end cinematic product video ad starting from a simple product photo |
| Talking Baby Video | Viral-style talking-baby video with custom costumes and scripts |
| UGC Lifestyle Try-On | UGC-style lifestyle photos & video of a person using your product β authentic, social-native |
| UGC Video Factory | Person photo + product photo + script β 10s vertical 9:16 UGC video ad with native dialogue (Nano-Banana Pro Edit β Seedance 2.0 VIP i2v) |
Social (5)
| Skill | Description |
|---|---|
| Instagram Post | Polished on-brand Instagram post β hero image + caption + hashtags |
| Product Campaign Pack | Full multi-channel campaign β hero visuals, social assets, short ad video, platform crops |
| RedNote Cover | Xiaohongshu (ε°ηΊ’δΉ¦) cover image β vibrant lifestyle aesthetic with typography overlay |
| Social Media Pack | Re-render a hero image into Instagram / TikTok / Shorts / X aspect ratios |
| UGC Ads Workflow | UGC video ad pipeline β combine selfie + product image, write script, animate |
Visual / Images & Design (21)
| Skill | Description |
|---|---|
| Action Figure Generator | Convert a photo of a person into a custom 3D action figure with collectible toy packaging |
| Ad Creative Set | High-converting ad set β hero image, copy variations, platform crops for Meta / Google / LinkedIn |
| Amazon Product Listing Pack | Full Amazon listing image set β hero, lifestyle, infographic, comparison/detail closeups |
| Blog Header | Professional 1200Γ628 blog header image with optional title composition guidance |
| Brand Kit | Cohesive brand visual kit β logo concept, color palette, typography pairings |
| Brochure Designer | Multi-page brochure β cover, inner spread, back β for business, real estate, events, launches |
| Couple Grid Creator | Stylized 6-box grid of a couple in romantic poses, each pose framed inside cardboard packaging |
| Brand Design Guide | Comprehensive design guide β palette, typography, UI components, visual identity rules |
| Fashion Try-On | Virtually try outfits by combining a person's photo + clothing item, optional fashion model video |
| Floor Plan Rendering | Design a 2D floor plan and convert into a realistic 3D architectural rendering |
| Interior Design | Pro interior design visualizations β redesign rooms, generate concepts, visualize furniture styles |
| Interior Design Visualizer | Generate an empty room and fill it with stylish furniture / decor; or redesign an existing room |
| Keyboard Art Maker | Artistic top-down photos of keyboard keycaps arranged to spell custom messages |
| Logo + Branding Package | Logo + full branding package β variations (dark/light/icon), palette, mockups |
| Logo Generator | Quick single-shot polished logo β fast, clean vector aesthetic with accurate brand-name text |
| Multi-Angle Reshoot | Re-render a subject from dramatic camera angles (fish-eye, bird's-eye, low, macro) β identity preserved |
| Multi-Angle Shots | Full multi-angle product shot set β front, side, back, top-down, 45Β° |
| Selfie with Celebrities | Realistic behind-the-scenes selfie of the user with a celebrity; optional cinematic long-take |
| Storyboard Generator | Generate N keyframes for a short story or scene sequence (image only, no video) |
| URL to Design | Analyze a website URL and generate a redesigned, improved UI with modern aesthetics |
| YouTube Thumbnail | High-CTR YouTube thumbnail β striking imagery, bold text placement, emotional face/subject |
Each recipe declares its inputs and a Steps body. Pass the inputs and let your agent execute the steps via muapi CLI calls (or raw API for endpoints that don't yet have a CLI alias β see the per-skill Notes for the Executing Agent footer).
π Quick Start
1. Install the muapi CLI
The core scripts require muapi-cli. Install it once:
# via npm (recommended β no Python required)
npm install -g muapi-cli
# via pip
pip install muapi-cli
# or run without installing
npx muapi-cli --help
2. Configure Your API Key
# Interactive setup
muapi auth configure
# Or pass directly
muapi auth configure --api-key "YOUR_MUAPI_KEY"
# Get your key at https://muapi.ai/dashboard
3. Install the Skills
# Install all skills to your AI agent
npx skills add SamurAIGPT/Generative-Media-Skills --all
# Or install a specific skill
npx skills add SamurAIGPT/Generative-Media-Skills --skill muapi-media-generation
# Install to specific agents
npx skills add SamurAIGPT/Generative-Media-Skills --all -a claude-code -a cursor
4. Generate Your First Image
muapi image generate "a cyberpunk city at night" --model flux-dev
# Download the result automatically
muapi image generate "a sunset over mountains" --model hidream-fast --download ./outputs
# Extract just the URL (agent-friendly)
muapi image generate "product on white bg" --model flux-schnell --output-json --jq '.outputs[0]'
5. Run an Expert Skill
# Use Nano-Banana reasoning to generate a 2K masterpiece
bash library/visual/nano-banana/scripts/generate-nano-art.sh \
--file ./my-source-image.jpg \
--subject "a glass hummingbird" \
--style "macro photography" \
--resolution "2k" \
--view
6. Direct a Cinematic Scene
cd library/motion/cinema-director
# Create a 10-second epic reveal
bash scripts/generate-film.sh \
--subject "a cybernetic dragon over Tokyo" \
--intent "epic" \
--model "kling-v3.0-pro" \
--duration 10 \
--view
# Animate a reference image into video
bash library/motion/seedance-2/scripts/generate-seedance.sh \
--mode i2v \
--file ./concept.jpg \
--subject "camera slowly pulls back to reveal the full landscape" \
--intent "reveal" \
--view
# Extend an existing video
bash library/motion/seedance-2/scripts/generate-seedance.sh \
--mode extend \
--request-id "YOUR_REQUEST_ID" \
--subject "camera continues pulling back to reveal the vast city" \
--duration 10
π€ MCP Server
Run muapi as a Model Context Protocol server so Claude Desktop, Cursor, or any MCP-compatible agent can call generation tools directly β no shell scripts needed.
muapi mcp serve
Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"muapi": {
"command": "muapi",
"args": ["mcp", "serve"],
"env": { "MUAPI_API_KEY": "your-key-here" }
}
}
}
This exposes 19 structured tools with full JSON Schema input/output definitions:
| Tool | Description |
|---|---|
muapi_image_generate | Text-to-image (14 models) |
muapi_image_edit | Image-to-image editing (11 models) |
muapi_video_generate | Text-to-video (13 models) |
muapi_video_from_image | Image-to-video (16 models) |
muapi_audio_create | Music generation (Suno) |
muapi_audio_from_text | Sound effects (MMAudio) |
muapi_enhance_upscale | AI upscaling |
muapi_enhance_bg_remove | Background removal |
muapi_enhance_face_swap | Face swap image/video |
muapi_enhance_ghibli | Ghibli style transfer |
muapi_edit_lipsync | Lip sync to audio |
muapi_edit_clipping | AI highlight extraction |
muapi_predict_result | Poll prediction status |
muapi_upload_file | Upload local file β URL |
muapi_keys_list | List API keys |
muapi_keys_create | Create API key |
muapi_keys_delete | Delete API key |
muapi_account_balance | Get credit balance |
muapi_account_topup | Add credits (Stripe checkout) |
β‘ Agentic Pipeline Examples
# Submit async, capture request_id, poll when ready
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
--model kling-master --no-wait --output-json --jq '.request_id' | tr -d '"')
# ... do other work ...
muapi predict wait "$REQUEST_ID" --download ./outputs
# Pipe a prompt from another command
generate_prompt | muapi image generate - --model flux-dev
# Chain: upload β edit β download
URL=$(muapi upload file ./photo.jpg --output-json --jq '.url' | tr -d '"')
muapi image edit "make it look like a painting" --image "$URL" \
--model flux-kontext-pro --download ./outputs
π Schema Reference
This repository includes a streamlined schema_data.json that core scripts use at runtime to:
- Validate Model IDs: Ensures the requested model exists.
- Resolve Endpoints: Automatically maps model names to API endpoints.
- Check Parameters: Validates supported
aspect_ratio,resolution, anddurationvalues.
Discover all available models via the CLI:
muapi models list
muapi models list --category video --output-json
π§ Compatibility
Optimized for the next generation of AI development environments:
- Claude Code β Direct terminal execution via tools + MCP server mode.
- Gemini CLI / Cursor / Windsurf β Seamless integration as local scripts.
- MCP β Full Model Context Protocol server with typed input/output schemas.
- CI/CD β
--output-json,--jq, semantic exit codes for scripting.
π License
MIT Β© 2026