CX_DB8
April 9, 2026 · View on GitHub
Unsupervised, contextual, extractive summarizer built for competitive debate evidence — and useful for any document.
CX_DB8 uses modern sentence embeddings to find the most relevant words, sentences, or paragraphs in a document relative to a query. It highlights and underlines text by semantic similarity, producing beautiful terminal output, Word documents, HTML, and SVG exports.

Features
- Four granularity levels — phrase, word, sentence, or paragraph extraction
- Any sentence-transformer model — swap models with a single flag
- Beautiful Rich TUI — styled terminal output with panels, tables, and color-coded highlights
- Multiple exports — Word (.docx), HTML, and SVG output formats
- Interactive mode — process multiple cards in sequence, save all to one document
- 3D visualization — explore the embedding space with interactive matplotlib + UMAP plots
- Fast — default model runs on CPU in seconds, no GPU required
Quick Start
Install with UV (recommended)
uv tool install git+https://github.com/Hellisotherpeople/CX_DB8.git
Or clone and install locally:
git clone https://github.com/Hellisotherpeople/CX_DB8.git
cd CX_DB8
uv sync
Install with pip
pip install git+https://github.com/Hellisotherpeople/CX_DB8.git
Run the demo
cx-db8 demo
Usage
Basic summarization
# From a file
cx-db8 run --file evidence.txt --query "nuclear war causes extinction"
# Pipe text in
cat evidence.txt | cx-db8 run --query "economic collapse"
# Interactive prompt (paste text, Ctrl-D to finish)
cx-db8 run
Granularity levels
# Sentence level (default) — best for most use cases
cx-db8 run -f card.txt -q "hegemony decline" -g sentence
# Phrase level — word-level scoring with grammatical bridging
cx-db8 run -f card.txt -q "hegemony decline" -g phrase
# Word level — raw token-level extraction with context windows
cx-db8 run -f card.txt -q "hegemony decline" -g word
# Paragraph level — coarse-grained extraction
cx-db8 run -f card.txt -q "hegemony decline" -g paragraph
Phrase mode is the sweet spot between word and sentence: it scores each word individually (with contextual n-gram windows), then bridges small gaps between important words so that the underlined/highlighted portions read as grammatical phrases instead of isolated tokens. Use --bridge-gap N to control how many filler words get absorbed (default 3).
Control thresholds
# Underline top 30%, highlight top 15%
cx-db8 run -f card.txt -q "warming" -u 70 -H 85
# Aggressive: only keep top 10%
cx-db8 run -f card.txt -q "warming" -u 90 -H 95
Export formats
# Word document
cx-db8 run -f card.txt -q "deterrence" --docx summary.docx
# HTML
cx-db8 run -f card.txt -q "deterrence" --html summary.html
# SVG screenshot
cx-db8 run -f card.txt -q "deterrence" --svg summary.svg
# All at once
cx-db8 run -f card.txt -q "deterrence" --docx out.docx --html out.html --svg out.svg
Choose a model
# List recommended models
cx-db8 models
# Use a specific model
cx-db8 run -f card.txt -q "query" --model all-mpnet-base-v2
Interactive mode
Process multiple cards in a session and save all summaries to a Word document:
cx-db8 run --interactive
3D Visualization
# Install visualization dependencies
uv pip install cx-db8[viz]
# Run with visualization
cx-db8 run -f card.txt -q "query" --viz

How It Works
CX_DB8 is an unsupervised extractive summarizer that works by computing semantic similarity between a query and each unit of text:
- Encode the query into a dense vector using a sentence-transformer model
- Segment the text into spans (words with context windows, sentences, or paragraphs)
- Encode each span into the same embedding space
- Score each span by cosine similarity to the query vector
- Threshold using percentile-based cutoffs to determine what gets highlighted, underlined, or removed
For word and phrase-level summarization, each word is embedded along with its surrounding context window (default ±10 words), preserving contextual meaning rather than treating each word in isolation. Phrase mode additionally bridges small gaps (default ≤3 words) between kept words, promoting function words like articles and prepositions so the underlined text reads grammatically.
Sentence-Level Summary
Phrase-Level Summary
Configuration
All settings are available as CLI flags. Run cx-db8 run --help for full documentation:
| Flag | Default | Description |
|---|---|---|
-f, --file | stdin | Input text file |
-q, --query | interactive | Card tag / query |
-g, --granularity | sentence | phrase, word, sentence, or paragraph |
-u, --underline | 70 | Underline percentile (1-99) |
-H, --highlight | 85 | Highlight percentile (1-99) |
-m, --model | all-MiniLM-L6-v2 | Sentence-transformer model |
-w, --word-window | 10 | Context window for word/phrase level |
-b, --bridge-gap | 3 | Max gap to bridge in phrase mode |
--docx | — | Export as Word document |
--html | — | Export as HTML |
--svg | — | Export as SVG screenshot |
--viz | false | Show 3D embedding plot |
-i, --interactive | false | Interactive loop mode |
Development
git clone https://github.com/Hellisotherpeople/CX_DB8.git
cd CX_DB8
uv sync --extra dev
uv run pytest
Record demo GIFs
Requires VHS:
vhs demo.tape
vhs demo_help.tape
Background
In American competitive cross-examination debate (Policy Debate), debaters summarize evidence by underlining and highlighting the most important parts of source documents. This manual process is what CX_DB8 automates.
The original version (2018-2019) used TensorFlow Hub's Universal Sentence Encoder and Flair embeddings. This v2.0 rewrite modernizes the stack with sentence-transformers, Rich TUI, and UV packaging while preserving the core algorithm.
A webapp version implementing similar functionality is available at Hugging Face Spaces.
License
MIT