ProjectBEA

February 18, 2026 · View on GitHub

ProjectBEA is a modular, fully autonomous AI VTuber engine. It powers a living AI persona — Bea — that can hold live conversations, monologue to her audience when idle, join Discord voice calls, play Minecraft autonomously, and remember past sessions via a built-in RAG memory system. All of this is orchestrated through a clean plugin-based architecture where every component is swappable.

Built for fun by a 19-year-old CS student learning Python. Open-source, self-hostable, and designed to be easily extended.


Features

FeatureDescription
Swappable LLMsGemini, OpenAI-compatible (GPT-4o, Groq, GLM-4.7) — switch at runtime
Multiple TTS enginesEdgeTTS (free), Kokoro (local ONNX), Orpheus (API)
OBS IntegrationAvatar PNG/video swap, animated text bubble via WebSocket
RAG MemoryChromaDB-powered diary system — Bea remembers past sessions
Discord SkillFull voice call integration — listens, transcribes, responds live
Minecraft SkillAutonomous LLM-driven agent that plays Minecraft via WebSocket
Monologue SkillWhen idle, Bea automatically starts talking to her audience
Web DashboardReact + FastAPI dashboard for chat, config, skill control, brain activity
Hot ReloadChange models, voices, or settings at runtime without restart
Plugin SkillsEvery capability is a BaseSkill plugin — add your own in minutes

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                        AIVtuberBrain                            │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌───────────────┐    │
│  │  LLM     │  │  TTS     │  │  STT     │  │  OBS          │    │
│  │ (pluggable)│ │ (pluggable)│ │ (Groq)   │  │ (WebSocket) │    │
│  └──────────┘  └──────────┘  └──────────┘  └───────────────┘    │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                    SkillManager                         │    │
│  │  ┌──────────┐ ┌──────────┐ ┌───────────┐ ┌─────────┐    │    │
│  │  │ Memory   │ │ Discord  │ │ Minecraft │ │Monologue│    │    │
│  │  │ (RAG)    │ │ (Voice)  │ │ (Agent)   │ │ (Idle)  │    │    │
│  │  └──────────┘ └──────────┘ └───────────┘ └─────────┘    │    │
│  └─────────────────────────────────────────────────────────┘    │
│  ┌──────────────────┐   ┌──────────────────────────────────┐    │
│  │  HistoryManager  │   │  EventManager                    │    │
│  │  (sessions/JSON) │   │  (system, input, output, skill)  │    │
│  └──────────────────┘   └──────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

              ┌───────────────┴───────────────┐
              │       FastAPI + React         │
              │       Web Dashboard           │
              └───────────────────────────────┘

Full Architecture Documentation →


Project Structure

ProjectBEA/
├── main.py                    # Entry point (CLI args + engine bootstrap)
├── config.json                # Persistent runtime configuration
├── requirements.txt
├── data/
│   ├── conversations/         # Saved session JSON files
│   ├── memory_db/             # ChromaDB persistent storage
│   ├── pngs/                  # Avatar images per mood (idle/talking)
│   └── prompts/               # System prompts (persona, monologue, minecraft)
├── docs/                      # Full documentation (you are here)
└── src/
    ├── core/
    │   ├── brain.py           # Central orchestrator
    │   ├── config.py          # BrainConfig dataclass + config.json I/O
    │   ├── events.py          # EventManager (pub/sub, brain activity log)
    │   └── resources.py       # Avatar resource loader
    ├── interfaces/
    │   └── base_interfaces.py # Abstract contracts: LLM, TTS, STT, OBS
    ├── modules/
    │   ├── llm/               # LLM providers (Gemini, OpenAI, Groq, GLM)
    │   ├── tts/               # TTS engines (EdgeTTS, Kokoro, Orpheus)
    │   ├── STT/               # STT (Groq/Whisper)
    │   ├── obs/               # OBS WebSocket controller
    │   └── skills/            # Plugin skill system
    │       ├── base_skill.py
    │       ├── skill_manager.py
    │       ├── memory/        # RAG memory + ChromaDB
    │       ├── discord/       # Discord voice skill + Node.js bot
    │       ├── minecraft/     # Minecraft autonomous agent
    │       └── implementations/ # Monologue + misc skills
    ├── utils/
    │   ├── history_manager.py # Conversation session persistence
    │   ├── llm_utils.py       # JSON response parsing
    │   └── text_utils.py      # Text formatting utilities
    └── web/
        ├── app.py             # FastAPI REST API
        ├── server.py          # Uvicorn launcher
        └── frontend/          # React + Vite + Tailwind dashboard

Quick Start

1. Prerequisites

  • Python 3.10+
  • Node.js 18+ (for the Discord bot)
  • OBS Studio with WebSocket plugin enabled (Tools → WebSocket Server Settings)
  • A virtual audio cable such as VB-Audio Cable (optional but recommended)

2. Install Python dependencies

pip install -r requirements.txt

3. Configure

Copy .env.example to .env (or set environment variables directly):

OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIzaSy...
GROQ_API_KEY=gsk_...
DISCORD_TOKEN=...

Review config.json to set your OBS source names, audio device ID, TTS voice, and which skills are enabled.

Full Configuration Guide →

4. Run

CLI mode (terminal interactive):

python main.py

Web Dashboard mode (FastAPI + React UI):

python main.py --web

Override provider at launch:

python main.py --llm-provider gemini --tts-provider kokoro --web

Setup & Deployment Guide →


Modules

The engine is built around three types of components, each defined by an abstract interface in src/interfaces/base_interfaces.py. Any provider can be swapped without touching the core.

ComponentInterfaceImplementations
LLMLLMInterfaceGemini, OpenAI, Groq, GLM-4.7
TTSTTSInterfaceEdgeTTS, Kokoro (local), Orpheus
STTSTTInterfaceGroq (Whisper large-v3-turbo)
OBSOBSInterfaceOBS WebSocket (obs-websocket-py)

LLM Modules → · TTS Modules → · STT → · OBS →


Skills — Plugin System

Skills are autonomous background capabilities managed by the SkillManager. Each extends BaseSkill and can be enabled/disabled at runtime (including hot-toggle from the web UI).

SkillDescription
MemoryRAG system: converts sessions into diary entries, stores in ChromaDB, injects relevant context into every prompt
DiscordLaunches a Node.js Discord bot; listens in voice channels, transcribes speech, sends audio back live
MinecraftConnects via WebSocket to a Minecraft mod; an LLM agent autonomously performs actions using tool-calling
MonologueWhen the audience is silent, Bea starts unprompted storytelling — episodically, chunk by chunk

Skills Overview →


Web Dashboard

The --web flag starts a FastAPI backend (port 8000) and serves a React + Tailwind frontend.

Pages:

  • Chat — text chat with Bea, session management
  • Brain Activity — real-time event feed (inputs, outputs, skill events, thoughts)
  • Skills — toggle skills on/off at runtime
  • Config — edit every setting live with hot reload

API Reference → · Frontend →


Full Documentation

DocumentContents
ArchitectureSystem design, data flow, event system
Setup & InstallInstallation, OBS setup, audio routing
ConfigurationAll config fields, CLI args, .env vars
LLM ModulesProviders, response format, adding new LLMs
TTS ModulesEdgeTTS, Kokoro, Orpheus
STT ModuleGroq Whisper transcription
OBS ModuleAvatar control, text animation
Skills OverviewBaseSkill API, SkillManager lifecycle
Memory SkillRAG system, ChromaDB, diary generation
Discord SkillDiscord bot setup, voice pipeline
Minecraft SkillMC agent, tool-calling, WebSocket bridge
Monologue SkillIdle storytelling state machine
Web APIAll REST endpoints
FrontendReact component structure

Extending ProjectBEA

The modular design makes adding new capabilities straightforward:

  • New LLM provider → implement LLMInterface, register in main.py
  • New TTS engine → implement TTSInterface, add to CLI choices
  • New Skill → extend BaseSkill, register in SkillManager

See Skills Overview for the full plugin API.


About

Built by Emanuele Faraci, 19-year-old Computer Science student from Italy.

This project started as a way to learn Python properly, specifically async programming, API integrations, and modular system design, while building something actually fun. It grew from a simple TTS + OBS script into a full VTuber engine with skills, memory, and a web dashboard.

just a side project built for fun and learning.

Portfolio: emanuelefaraci.com


License

This project is open-source. See LICENSE for details.