🚀 Velesio AI Server

January 22, 2026 · View on GitHub

High-performance, microservice-based AI inference server with Unity integration support.

✨ Features

🎯 Unity Ready: Seamless integration with Unity!
📈 Scalable: Redis queue-based worker architecture
🐳 Easy Deploy: Docker Compose setup for inference setup, api wrapper, nginx & monitoring
📊 Monitoring: Grafana template for System, GPU & Application observability

⚡ Quick Start

🚀 RunPod Quickstart - Get started quickly with Runpod! New to RunPod? Use my refferal link to get a 5$ bonus and support the project!
🚀 Self Hosted Quickstart - Get started with your own infrastructure!

🎮 Unity Integration

Built specifically for Unity developers:

PersonaForge on the Unity Asset Store!

📚 Documentation

📖 Complete Documentation - Full guides, API reference, and examples

🏗️ Model Templates - Model stack templates
🚢 Deployment Strategies - Both distributed and standalone
🛠️ Components - Individual service configuration
🎮 Discord - For Support & Discussion

🏗️ Architecture

Distributed microservice design for maximum flexibility:

┌─────────────┐    ┌─────────┐    ┌─────────────┐
│    API      │────│  Redis  │────│ GPU Workers │
│  (FastAPI)  │    │ Queue   │    │ (LLM + SD)  │
└─────────────┘    └─────────┘    └─────────────┘
       │                                  │
       │           ┌─────────────┐        │
       └───────────│ Monitoring  │────────┘
                   │(Grafana+Prom)│
                   └─────────────┘

API Service: FastAPI with token auth and job queuing
GPU Workers: Custom llama.cpp + Stable Diffusion inference engines
Redis Queue: Decoupled job processing for scalability
Monitoring: Pre-configured Grafana dashboards

📖 Learn more: Architecture Documentation

🔌 Open Source References

Automatic1111 SD Web server LLAMACPP

Questions? Check the Documentation or open an issue!