๐ Velesio AI Server
January 22, 2026 ยท View on GitHub
High-performance, microservice-based AI inference server with Unity integration support.
โจ Features
- ๐ฏ Unity Ready: Seamless integration with Unity!
- ๐ Scalable: Redis queue-based worker architecture
- ๐ณ Easy Deploy: Docker Compose setup for inference setup, api wrapper, nginx & monitoring
- ๐ Monitoring: Grafana template for System, GPU & Application observability
โก Quick Start
-
๐ RunPod Quickstart - Get started quickly with Runpod! New to RunPod? Use my refferal link to get a 5$ bonus and support the project!
-
๐ Self Hosted Quickstart - Get started with your own infrastructure!
๐ฎ Unity Integration
Built specifically for Unity developers:
๐ Documentation
๐ Complete Documentation - Full guides, API reference, and examples
- ๐๏ธ Model Templates - Model stack templates
- ๐ข Deployment Strategies - Both distributed and standalone
- ๐ ๏ธ Components - Individual service configuration
- ๐ฎ Discord - For Support & Discussion
๐๏ธ Architecture
Distributed microservice design for maximum flexibility:
โโโโโโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ API โโโโโโ Redis โโโโโโ GPU Workers โ
โ (FastAPI) โ โ Queue โ โ (LLM + SD) โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ โ
โ โโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโ Monitoring โโโโโโโโโโ
โ(Grafana+Prom)โ
โโโโโโโโโโโโโโโ
- API Service: FastAPI with token auth and job queuing
- GPU Workers: Custom llama.cpp + Stable Diffusion inference engines
- Redis Queue: Decoupled job processing for scalability
- Monitoring: Pre-configured Grafana dashboards
๐ Learn more: Architecture Documentation
๐ Open Source References
Automatic1111 SD Web server LLAMACPP
Questions? Check the Documentation or open an issue!