Tiny LLM Council

March 4, 2026 · View on GitHub

A spectator sport for AI. Watch five ultra-small language models autonomously debate philosophical questions through a dynamic 6-stage pipeline—with alliances, rebuttals, peer reviews, and a synthesizing chairman. Purely for fun and visual theater.

GitHub MIT License Python 3.13 React 19

The Debate Pipeline

Five models engage in structured deliberation across six stages:

Stage 0       Stage 1       Stage 2         Stage 3       Stage 4         Stage 5
Topic    →   Opinions  →   Peer Review  →  Rebuttals  →  Alliances  →   Synthesis
Proposed      (5 models)    (Anon Scores)   (Bottom 2)     (Agreements)    (Chairman)

Each stage streams live to the browser with animations, score reveals, alliance connection lines, and a chairman spotlight during final synthesis.

Key Features

6-Stage Debates

  • Topic proposed by a rotating model
  • All 5 models opine in parallel
  • Anonymous peer review (Response A/B/C/D scoring)
  • Bottom 2 scorers get a rebuttal chance
  • All models declare alliances and disagreements
  • Rotating chairman synthesizes with full context

Visual Theater

  • Speech bubbles animate from avatars as responses arrive
  • Score badges flip-reveal with color coding (green 8-10, yellow 5-7, red 1-4)
  • Rebuttal spotlight dims non-participants to 40% opacity with golden glow on speakers
  • Alliance lines (green solid = agree, red dashed = disagree) overlay the pentagon arena
  • Chairman synthesis entrance: avatars slide outward, chairman scales up with golden aura
  • Word-by-word typing effect for synthesis text

Timeline Replay

  • After session ends, scrub backward/forward through the debate
  • Replay recorded events at 2x speed with play/pause controls
  • Perfect for rewatching dramatic moments

Achievement System

  • 10 unlockable badges for interesting model behavior:
    • Comeback King, Unanimous, Contrarian, Peacemaker, Iron Wall
    • Silver Tongue, Workhorse, Dark Horse, Rivalry, Best Friends
  • Badges display on avatars and persist in Hall of Fame

Persistent Model Personalities

  • Models develop evolving behavioral profiles across sessions
  • High contrarians challenge conventional thinking
  • Models with strong comeback rates receive personality nudges
  • Tendency patterns (who they agree/disagree with) influence future debates

Analytics Dashboard

  • Leaderboard ranked by peer review scores
  • Per-model performance history with session breakdown
  • Head-to-head comparison between any two models
  • Hall of Fame tracking most recent achievements

Models (5 Ultra-Tiny Champions)

Model IDNameCompanySizePort
tinyllama-1bTinyLlama 1.1BTinyLlama1.1B8081
gemma-270mGemma 3 270MGoogle270M8082
smollm2-360mSmolLM2 360MHuggingFace360M8083
qwen25-05bQwen2.5 0.5BAlibaba500M8084
h2o-danube3-500mH2O Danube3 500MH2O.ai500M8085

Total: ~1.9GB on disk (Q4_K_M quantization). Light enough to run all 5 on CPU simultaneously.

Getting Started

Prerequisites

  • llama.cpp — Install via Homebrew or build from source:

    brew install llama.cpp
    

    Or compile from ggerganov/llama.cpp.

  • Python 3.13 — Exact version required (3.14 has pydantic-core incompatibility):

    python --version  # Should show Python 3.13.x
    
  • Node.js 18+ — For the React frontend:

    node --version
    

Step 1: Download Models

bash scripts/download_models.sh

This downloads all 5 GGUF models (~1.9GB) to ./models/. Requires huggingface-cli:

pip install huggingface-hub

Step 2: Start the Backend

cd backend
python -m venv .venv
source .venv/bin/activate   # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

uvicorn main:app --port 8000

The backend auto-starts all 5 llama.cpp servers on ports 8081-8085. On first run, expect ~30 seconds for models to load into memory.

Health check: Open http://localhost:8000/api/health

Step 3: Start the Frontend

In a new terminal:

cd frontend
npm install
npm run dev

The Vite dev server starts on http://localhost:5173

Step 4: Start a Debate

Open http://localhost:5173 in your browser. Click "Convene Council" and watch the debate unfold live. Sessions take ~2-3 minutes across all 6 stages.

Tech Stack

Backend:

  • FastAPI (REST + WebSocket for real-time events)
  • SQLAlchemy with async SQLite ORM
  • llama.cpp subprocess management
  • Pydantic for request/response validation

Frontend:

  • React 19 + TypeScript
  • Vite for bundling
  • Framer Motion for animations
  • Recharts for analytics dashboards
  • WebSocket for live event streaming

Models:

  • llama.cpp for inference (5 concurrent servers)
  • GGUF quantization (Q4_K_M for speed/size trade-off)

Project Structure

.
├── backend/
│   ├── main.py              # FastAPI app, WebSocket endpoints
│   ├── config.py            # Model definitions and paths
│   ├── orchestrator.py       # 6-stage debate engine
│   ├── model_manager.py      # llama.cpp subprocess lifecycle
│   ├── database.py           # SQLAlchemy ORM (7 tables)
│   ├── achievements.py       # Badge detection system
│   ├── analytics.py          # Aggregation queries
│   ├── prompts.py            # Debate prompt templates
│   └── requirements.txt
├── frontend/
│   ├── src/
│   │   ├── App.tsx           # Main component
│   │   ├── App.css           # Dark theme styling
│   │   ├── components/
│   │   │   ├── CouncilChamber.tsx    # Pentagon arena
│   │   │   ├── TimelineReplay.tsx    # Scrubbing controls
│   │   │   └── Analytics.tsx         # Dashboard
│   │   └── hooks/
│   │       └── useWebSocket.tsx      # Event streaming
│   ├── package.json
│   └── vite.config.ts
├── models/                  # Downloaded GGUF files (not in repo)
├── data/
│   └── council.db          # SQLite debate history
├── scripts/
│   └── download_models.sh   # Auto-fetch all 5 models
└── docs/
    └── plans/
        └── 2026-03-04-council-v2-design.md  # Architecture decisions

Database Schema

Seven tables store debate history, personalities, and achievements:

  • sessions — metadata (topic, proposer, chairman, timestamps)
  • responses — opinions, rebuttals, and synthesis text
  • peer_reviews — anonymous scores (Response A/B/C/D per model)
  • alliances — which model agreed/disagreed with whom
  • synthesis — chairman's final statement
  • model_personalities — evolving behavioral profiles
  • model_achievements — unlocked badges with timestamps

Queries power the leaderboard, head-to-head comparisons, and Hall of Fame.

Development

Backend Testing

Run the unit test suite:

cd backend
pytest -v

For integration tests (requires all 5 llama.cpp servers running):

INTEGRATION=1 TEST_DB=1 pytest test_integration.py -v

Key commands documented in CLAUDE.md.

Frontend Linting

cd frontend
npm run lint
npm run build   # TypeScript check + production build

API Overview

WebSocket (port 8000)

Client: {"action": "new_session"}
Server: Streams events in sequence:
  roles → stage(0) → topic → stage(1) → opinion ×5 → stage(2) →
  review ×5 → stage(3) → rebuttal ×2 → stage(4) → alliance ×5 →
  stage(5) → synthesis → achievement ×N → complete

REST Endpoints (port 8000)

  • GET /api/health — Model health check
  • GET /api/models — List all models with status
  • POST /api/models/{id}/load — Start a model
  • POST /api/models/{id}/unload — Stop a model
  • GET /api/sessions — All debate sessions
  • GET /api/sessions/{id} — Single session details
  • GET /api/analytics/leaderboard — Ranked by peer review score
  • GET /api/analytics/models/{id} — Per-model history
  • GET /api/analytics/head-to-head — Compare two models
  • GET /api/achievements — All unlocked badges

Why This Project?

Small language models are a frontier for on-device, privacy-preserving AI. Tiny LLM Council demonstrates that useful, entertaining applications don't require billion-parameter models—and proves that watching them argue is just plain fun. The 6-stage pipeline lets you glimpse how models disagree, change their minds under peer pressure, form alliances, and synthesize consensus.

Think of it as:

  • For AI researchers: A playground to study emergent behavior in small models
  • For developers: A reference for real-time multi-model orchestration
  • For everyone else: Pure entertainment — philosophical debates on demand

Contributing

Contributions welcome! Open an issue or submit a PR for:

  • New debate topics/question generation
  • Improved CSS themes or animations
  • Badge ideas or achievement detection logic
  • Backend optimizations for faster model inference
  • Additional analytics visualizations

Please ensure code follows the patterns in CLAUDE.md.

License

MIT License. See LICENSE file for details.

Authors

Built with attention to clarity, visual delight, and the joy of watching tiny models debate philosophy.

For questions or feedback, open an issue on GitHub.