๐ŸŽจ Gemini Vision Art Studio

April 18, 2025 ยท View on GitHub

A powerful MCP server leveraging Google's Gemini AI for advanced image generation and transformation. This studio offers two specialized tools: a 3D cartoon generator and an image processing transformer, both powered by the cutting-edge Gemini 2.0 Flash model.

License: MIT TypeScript Gemini AI

โœจ Features

1. 3D Cartoon Generator

  • Generate high-quality 3D cartoon images from text descriptions
  • Child-friendly designs with vibrant colors and engaging visuals
  • Perfect for children's books, educational materials, and creative projects

2. Image Transformer

  • Transform existing images using Gemini AI's vision capabilities
  • Apply various artistic styles and modifications
  • Enhance, modify, or completely reimagine your images

Additional Features

  • ๐Ÿ–ผ๏ธ Automatic preview generation
  • ๐ŸŒ Browser-based image viewing
  • ๐Ÿ’พ Local storage with organized output
  • ๐Ÿ”„ Real-time processing
  • ๐Ÿ“ฑ Cross-platform support

๐Ÿš€ Quick Start

Installation

# Clone the repository
git clone https://github.com/falahgs/gemini-vision-art-studio.git

# Install dependencies
cd gemini-vision-art-studio
npm install

Configuration

  1. Project Configuration: Create a .env file in the root directory:
GEMINI_API_KEY=your_api_key_here
# Set to true if running in a remote environment (no browser preview)
IS_REMOTE=true
  1. Claude Desktop Configuration: Add the server configuration to your Claude Desktop config file at %AppData%\Claude\claude_desktop_config.json:
{
  "mcpServers": {
    "gemini-vision-art-studio": {
      "command": "node",
      "args": [
        "PATH_TO_YOUR_PROJECT\\build\\src\\index.js"
      ],
      "env": {
        "GEMINI_API_KEY": "your_gemini_api_key_here",
        "IS_REMOTE": "true"
      }
    }
  }
}

Replace:

  • PATH_TO_YOUR_PROJECT with your actual project path
  • your_gemini_api_key_here with your Gemini API key

๐Ÿ’ก Note: On Windows, the config file is typically located at: C:\Users\YourUsername\AppData\Roaming\Claude\claude_desktop_config.json

Remote Usage

When running the server remotely:

  1. Set IS_REMOTE=true in your environment or Claude Desktop configuration

  2. The server will:

    • Create necessary directories automatically:
      • /app/output: For generated images and previews
      • /app/temp: For temporary processing files
    • Skip browser preview attempts
    • Save all files to the /app/output directory
    • Return absolute file paths in the response
  3. Directory Structure in Remote Mode:

    /app/
    โ”œโ”€โ”€ output/           # Generated images and previews
    โ”‚   โ”œโ”€โ”€ image1.png
    โ”‚   โ””โ”€โ”€ image1_preview.html
    โ””โ”€โ”€ temp/            # Temporary processing files
    
  4. Troubleshooting Remote Usage:

    • Ensure the /app directory exists and is writable
    • Check the console output for directory creation messages
    • Look for "Image saved to:" messages in the logs
    • File paths in the response will be absolute paths

Running the Server

  1. Build the project:
npm run build
  1. The server will be available in Claude Desktop automatically when you:
    • Open Claude Desktop
    • Start a new conversation
    • The tools will appear in the available tools list

๐Ÿ› ๏ธ Available Tools

1. Generate 3D Cartoon (generate_3d_cartoon)

Creates a 3D-style cartoon image from your text description.

{
  "name": "generate_3d_cartoon",
  "arguments": {
    "prompt": "A friendly dragon teaching math to forest animals",
    "fileName": "dragon_teacher"
  }
}

2. Process Image (process_image)

Transforms existing images according to your instructions.

{
  "name": "process_image",
  "arguments": {
    "imagePath": "input/photo.jpg",
    "prompt": "Transform this into a watercolor painting with autumn colors",
    "outputFileName": "watercolor_autumn"
  }
}

๐Ÿ“‚ Directory Structure

gemini-vision-art-studio/
โ”œโ”€โ”€ src/               # Source code
โ”œโ”€โ”€ build/            # Compiled code
โ”œโ”€โ”€ input/            # Input images
โ”œโ”€โ”€ output/           # Generated images and previews
โ”œโ”€โ”€ temp/             # Temporary processing files
โ””โ”€โ”€ examples/         # Example usage and images

๐Ÿ”ง Technical Details

  • Runtime: Node.js v14+
  • Language: TypeScript 5.8.3
  • AI Model: Gemini 2.0 Flash
  • Framework: Model Context Protocol (MCP) SDK
  • Image Processing: Google Generative AI

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ‘จโ€๐Ÿ’ป Author

Falah G. Salieh

  • Copyright ยฉ 2025
  • GitHub: @falahgs

๐Ÿ™ Acknowledgments

  • Google Gemini AI team for the powerful image generation model
  • The MCP SDK team for the excellent tooling
  • All contributors and users of this project

Made with โค๏ธ by Falah G. Salieh