Tiny LLM Council
March 4, 2026 · View on GitHub
A spectator sport for AI. Watch five ultra-small language models autonomously debate philosophical questions through a dynamic 6-stage pipeline—with alliances, rebuttals, peer reviews, and a synthesizing chairman. Purely for fun and visual theater.
The Debate Pipeline
Five models engage in structured deliberation across six stages:
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
Topic → Opinions → Peer Review → Rebuttals → Alliances → Synthesis
Proposed (5 models) (Anon Scores) (Bottom 2) (Agreements) (Chairman)
Each stage streams live to the browser with animations, score reveals, alliance connection lines, and a chairman spotlight during final synthesis.
Key Features
6-Stage Debates
- Topic proposed by a rotating model
- All 5 models opine in parallel
- Anonymous peer review (Response A/B/C/D scoring)
- Bottom 2 scorers get a rebuttal chance
- All models declare alliances and disagreements
- Rotating chairman synthesizes with full context
Visual Theater
- Speech bubbles animate from avatars as responses arrive
- Score badges flip-reveal with color coding (green 8-10, yellow 5-7, red 1-4)
- Rebuttal spotlight dims non-participants to 40% opacity with golden glow on speakers
- Alliance lines (green solid = agree, red dashed = disagree) overlay the pentagon arena
- Chairman synthesis entrance: avatars slide outward, chairman scales up with golden aura
- Word-by-word typing effect for synthesis text
Timeline Replay
- After session ends, scrub backward/forward through the debate
- Replay recorded events at 2x speed with play/pause controls
- Perfect for rewatching dramatic moments
Achievement System
- 10 unlockable badges for interesting model behavior:
- Comeback King, Unanimous, Contrarian, Peacemaker, Iron Wall
- Silver Tongue, Workhorse, Dark Horse, Rivalry, Best Friends
- Badges display on avatars and persist in Hall of Fame
Persistent Model Personalities
- Models develop evolving behavioral profiles across sessions
- High contrarians challenge conventional thinking
- Models with strong comeback rates receive personality nudges
- Tendency patterns (who they agree/disagree with) influence future debates
Analytics Dashboard
- Leaderboard ranked by peer review scores
- Per-model performance history with session breakdown
- Head-to-head comparison between any two models
- Hall of Fame tracking most recent achievements
Models (5 Ultra-Tiny Champions)
| Model ID | Name | Company | Size | Port |
|---|---|---|---|---|
| tinyllama-1b | TinyLlama 1.1B | TinyLlama | 1.1B | 8081 |
| gemma-270m | Gemma 3 270M | 270M | 8082 | |
| smollm2-360m | SmolLM2 360M | HuggingFace | 360M | 8083 |
| qwen25-05b | Qwen2.5 0.5B | Alibaba | 500M | 8084 |
| h2o-danube3-500m | H2O Danube3 500M | H2O.ai | 500M | 8085 |
Total: ~1.9GB on disk (Q4_K_M quantization). Light enough to run all 5 on CPU simultaneously.
Getting Started
Prerequisites
-
llama.cpp — Install via Homebrew or build from source:
brew install llama.cppOr compile from ggerganov/llama.cpp.
-
Python 3.13 — Exact version required (3.14 has pydantic-core incompatibility):
python --version # Should show Python 3.13.x -
Node.js 18+ — For the React frontend:
node --version
Step 1: Download Models
bash scripts/download_models.sh
This downloads all 5 GGUF models (~1.9GB) to ./models/. Requires huggingface-cli:
pip install huggingface-hub
Step 2: Start the Backend
cd backend
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
uvicorn main:app --port 8000
The backend auto-starts all 5 llama.cpp servers on ports 8081-8085. On first run, expect ~30 seconds for models to load into memory.
Health check: Open http://localhost:8000/api/health
Step 3: Start the Frontend
In a new terminal:
cd frontend
npm install
npm run dev
The Vite dev server starts on http://localhost:5173
Step 4: Start a Debate
Open http://localhost:5173 in your browser. Click "Convene Council" and watch the debate unfold live. Sessions take ~2-3 minutes across all 6 stages.
Tech Stack
Backend:
- FastAPI (REST + WebSocket for real-time events)
- SQLAlchemy with async SQLite ORM
- llama.cpp subprocess management
- Pydantic for request/response validation
Frontend:
- React 19 + TypeScript
- Vite for bundling
- Framer Motion for animations
- Recharts for analytics dashboards
- WebSocket for live event streaming
Models:
- llama.cpp for inference (5 concurrent servers)
- GGUF quantization (Q4_K_M for speed/size trade-off)
Project Structure
.
├── backend/
│ ├── main.py # FastAPI app, WebSocket endpoints
│ ├── config.py # Model definitions and paths
│ ├── orchestrator.py # 6-stage debate engine
│ ├── model_manager.py # llama.cpp subprocess lifecycle
│ ├── database.py # SQLAlchemy ORM (7 tables)
│ ├── achievements.py # Badge detection system
│ ├── analytics.py # Aggregation queries
│ ├── prompts.py # Debate prompt templates
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── App.tsx # Main component
│ │ ├── App.css # Dark theme styling
│ │ ├── components/
│ │ │ ├── CouncilChamber.tsx # Pentagon arena
│ │ │ ├── TimelineReplay.tsx # Scrubbing controls
│ │ │ └── Analytics.tsx # Dashboard
│ │ └── hooks/
│ │ └── useWebSocket.tsx # Event streaming
│ ├── package.json
│ └── vite.config.ts
├── models/ # Downloaded GGUF files (not in repo)
├── data/
│ └── council.db # SQLite debate history
├── scripts/
│ └── download_models.sh # Auto-fetch all 5 models
└── docs/
└── plans/
└── 2026-03-04-council-v2-design.md # Architecture decisions
Database Schema
Seven tables store debate history, personalities, and achievements:
- sessions — metadata (topic, proposer, chairman, timestamps)
- responses — opinions, rebuttals, and synthesis text
- peer_reviews — anonymous scores (Response A/B/C/D per model)
- alliances — which model agreed/disagreed with whom
- synthesis — chairman's final statement
- model_personalities — evolving behavioral profiles
- model_achievements — unlocked badges with timestamps
Queries power the leaderboard, head-to-head comparisons, and Hall of Fame.
Development
Backend Testing
Run the unit test suite:
cd backend
pytest -v
For integration tests (requires all 5 llama.cpp servers running):
INTEGRATION=1 TEST_DB=1 pytest test_integration.py -v
Key commands documented in CLAUDE.md.
Frontend Linting
cd frontend
npm run lint
npm run build # TypeScript check + production build
API Overview
WebSocket (port 8000)
Client: {"action": "new_session"}
Server: Streams events in sequence:
roles → stage(0) → topic → stage(1) → opinion ×5 → stage(2) →
review ×5 → stage(3) → rebuttal ×2 → stage(4) → alliance ×5 →
stage(5) → synthesis → achievement ×N → complete
REST Endpoints (port 8000)
GET /api/health— Model health checkGET /api/models— List all models with statusPOST /api/models/{id}/load— Start a modelPOST /api/models/{id}/unload— Stop a modelGET /api/sessions— All debate sessionsGET /api/sessions/{id}— Single session detailsGET /api/analytics/leaderboard— Ranked by peer review scoreGET /api/analytics/models/{id}— Per-model historyGET /api/analytics/head-to-head— Compare two modelsGET /api/achievements— All unlocked badges
Why This Project?
Small language models are a frontier for on-device, privacy-preserving AI. Tiny LLM Council demonstrates that useful, entertaining applications don't require billion-parameter models—and proves that watching them argue is just plain fun. The 6-stage pipeline lets you glimpse how models disagree, change their minds under peer pressure, form alliances, and synthesize consensus.
Think of it as:
- For AI researchers: A playground to study emergent behavior in small models
- For developers: A reference for real-time multi-model orchestration
- For everyone else: Pure entertainment — philosophical debates on demand
Contributing
Contributions welcome! Open an issue or submit a PR for:
- New debate topics/question generation
- Improved CSS themes or animations
- Badge ideas or achievement detection logic
- Backend optimizations for faster model inference
- Additional analytics visualizations
Please ensure code follows the patterns in CLAUDE.md.
License
MIT License. See LICENSE file for details.
Authors
Built with attention to clarity, visual delight, and the joy of watching tiny models debate philosophy.
For questions or feedback, open an issue on GitHub.