Tiny LLM Council

March 4, 2026 · View on GitHub

A spectator sport for AI. Watch five ultra-small language models autonomously debate philosophical questions through a dynamic 6-stage pipeline—with alliances, rebuttals, peer reviews, and a synthesizing chairman. Purely for fun and visual theater.

The Debate Pipeline

Five models engage in structured deliberation across six stages:

Stage 0       Stage 1       Stage 2         Stage 3       Stage 4         Stage 5
Topic    →   Opinions  →   Peer Review  →  Rebuttals  →  Alliances  →   Synthesis
Proposed      (5 models)    (Anon Scores)   (Bottom 2)     (Agreements)    (Chairman)

Each stage streams live to the browser with animations, score reveals, alliance connection lines, and a chairman spotlight during final synthesis.

Key Features

6-Stage Debates

Topic proposed by a rotating model
All 5 models opine in parallel
Anonymous peer review (Response A/B/C/D scoring)
Bottom 2 scorers get a rebuttal chance
All models declare alliances and disagreements
Rotating chairman synthesizes with full context

Visual Theater

Speech bubbles animate from avatars as responses arrive
Score badges flip-reveal with color coding (green 8-10, yellow 5-7, red 1-4)
Rebuttal spotlight dims non-participants to 40% opacity with golden glow on speakers
Alliance lines (green solid = agree, red dashed = disagree) overlay the pentagon arena
Chairman synthesis entrance: avatars slide outward, chairman scales up with golden aura
Word-by-word typing effect for synthesis text

Timeline Replay

After session ends, scrub backward/forward through the debate
Replay recorded events at 2x speed with play/pause controls
Perfect for rewatching dramatic moments

Achievement System

10 unlockable badges for interesting model behavior:
- Comeback King, Unanimous, Contrarian, Peacemaker, Iron Wall
- Silver Tongue, Workhorse, Dark Horse, Rivalry, Best Friends
Badges display on avatars and persist in Hall of Fame

Persistent Model Personalities

Models develop evolving behavioral profiles across sessions
High contrarians challenge conventional thinking
Models with strong comeback rates receive personality nudges
Tendency patterns (who they agree/disagree with) influence future debates

Analytics Dashboard

Leaderboard ranked by peer review scores
Per-model performance history with session breakdown
Head-to-head comparison between any two models
Hall of Fame tracking most recent achievements

Models (5 Ultra-Tiny Champions)

Model ID	Name	Company	Size	Port
tinyllama-1b	TinyLlama 1.1B	TinyLlama	1.1B	8081
gemma-270m	Gemma 3 270M	Google	270M	8082
smollm2-360m	SmolLM2 360M	HuggingFace	360M	8083
qwen25-05b	Qwen2.5 0.5B	Alibaba	500M	8084
h2o-danube3-500m	H2O Danube3 500M	H2O.ai	500M	8085

Total: ~1.9GB on disk (Q4_K_M quantization). Light enough to run all 5 on CPU simultaneously.

Getting Started

Prerequisites

llama.cpp — Install via Homebrew or build from source:
```
brew install llama.cpp
```
Or compile from ggerganov/llama.cpp.
Python 3.13 — Exact version required (3.14 has pydantic-core incompatibility):
```
python --version  # Should show Python 3.13.x
```
Node.js 18+ — For the React frontend:
```
node --version
```

Step 1: Download Models

bash scripts/download_models.sh

This downloads all 5 GGUF models (~1.9GB) to ./models/. Requires huggingface-cli:

pip install huggingface-hub

Step 2: Start the Backend

cd backend
python -m venv .venv
source .venv/bin/activate   # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

uvicorn main:app --port 8000

The backend auto-starts all 5 llama.cpp servers on ports 8081-8085. On first run, expect ~30 seconds for models to load into memory.

Health check: Open http://localhost:8000/api/health

Step 3: Start the Frontend

In a new terminal:

cd frontend
npm install
npm run dev

The Vite dev server starts on http://localhost:5173

Step 4: Start a Debate

Open http://localhost:5173 in your browser. Click "Convene Council" and watch the debate unfold live. Sessions take ~2-3 minutes across all 6 stages.

Tech Stack

Backend:

FastAPI (REST + WebSocket for real-time events)
SQLAlchemy with async SQLite ORM
llama.cpp subprocess management
Pydantic for request/response validation

Frontend:

React 19 + TypeScript
Vite for bundling
Framer Motion for animations
Recharts for analytics dashboards
WebSocket for live event streaming

Models:

llama.cpp for inference (5 concurrent servers)
GGUF quantization (Q4_K_M for speed/size trade-off)

Project Structure

.
├── backend/
│   ├── main.py              # FastAPI app, WebSocket endpoints
│   ├── config.py            # Model definitions and paths
│   ├── orchestrator.py       # 6-stage debate engine
│   ├── model_manager.py      # llama.cpp subprocess lifecycle
│   ├── database.py           # SQLAlchemy ORM (7 tables)
│   ├── achievements.py       # Badge detection system
│   ├── analytics.py          # Aggregation queries
│   ├── prompts.py            # Debate prompt templates
│   └── requirements.txt
├── frontend/
│   ├── src/
│   │   ├── App.tsx           # Main component
│   │   ├── App.css           # Dark theme styling
│   │   ├── components/
│   │   │   ├── CouncilChamber.tsx    # Pentagon arena
│   │   │   ├── TimelineReplay.tsx    # Scrubbing controls
│   │   │   └── Analytics.tsx         # Dashboard
│   │   └── hooks/
│   │       └── useWebSocket.tsx      # Event streaming
│   ├── package.json
│   └── vite.config.ts
├── models/                  # Downloaded GGUF files (not in repo)
├── data/
│   └── council.db          # SQLite debate history
├── scripts/
│   └── download_models.sh   # Auto-fetch all 5 models
└── docs/
    └── plans/
        └── 2026-03-04-council-v2-design.md  # Architecture decisions

Database Schema

Seven tables store debate history, personalities, and achievements:

sessions — metadata (topic, proposer, chairman, timestamps)
responses — opinions, rebuttals, and synthesis text
peer_reviews — anonymous scores (Response A/B/C/D per model)
alliances — which model agreed/disagreed with whom
synthesis — chairman's final statement
model_personalities — evolving behavioral profiles
model_achievements — unlocked badges with timestamps

Queries power the leaderboard, head-to-head comparisons, and Hall of Fame.

Development

Backend Testing

Run the unit test suite:

cd backend
pytest -v

For integration tests (requires all 5 llama.cpp servers running):

INTEGRATION=1 TEST_DB=1 pytest test_integration.py -v

Key commands documented in CLAUDE.md.

Frontend Linting

cd frontend
npm run lint
npm run build   # TypeScript check + production build

API Overview

WebSocket (port 8000)

Client: {"action": "new_session"}
Server: Streams events in sequence:
  roles → stage(0) → topic → stage(1) → opinion ×5 → stage(2) →
  review ×5 → stage(3) → rebuttal ×2 → stage(4) → alliance ×5 →
  stage(5) → synthesis → achievement ×N → complete

REST Endpoints (port 8000)

GET /api/health — Model health check
GET /api/models — List all models with status
POST /api/models/{id}/load — Start a model
POST /api/models/{id}/unload — Stop a model
GET /api/sessions — All debate sessions
GET /api/sessions/{id} — Single session details
GET /api/analytics/leaderboard — Ranked by peer review score
GET /api/analytics/models/{id} — Per-model history
GET /api/analytics/head-to-head — Compare two models
GET /api/achievements — All unlocked badges

Why This Project?

Small language models are a frontier for on-device, privacy-preserving AI. Tiny LLM Council demonstrates that useful, entertaining applications don't require billion-parameter models—and proves that watching them argue is just plain fun. The 6-stage pipeline lets you glimpse how models disagree, change their minds under peer pressure, form alliances, and synthesize consensus.

Think of it as:

For AI researchers: A playground to study emergent behavior in small models
For developers: A reference for real-time multi-model orchestration
For everyone else: Pure entertainment — philosophical debates on demand

Contributing

Contributions welcome! Open an issue or submit a PR for:

New debate topics/question generation
Improved CSS themes or animations
Badge ideas or achievement detection logic
Backend optimizations for faster model inference
Additional analytics visualizations

Please ensure code follows the patterns in CLAUDE.md.

License

MIT License. See LICENSE file for details.

Authors

Built with attention to clarity, visual delight, and the joy of watching tiny models debate philosophy.

For questions or feedback, open an issue on GitHub.