README.md
April 8, 2026 · View on GitHub
CASSIA (Collaborative Agent System for Single-cell Interpretable Annotation) is a tool that enhances cell type annotation using multi-agent Large Language Models (LLMs).
🌐 CASSIA Web UI (cassia.bio) - Try CASSIA's core features online. For a comprehensive experience with all advanced features, use our R or Python package.
📚 Complete Documentation/Vignette (docs.cassia.bio)
🤖 LLMs Annotation Benchmark (sc-llm-benchmark.com)
📰 News
2026-04-08 🐛 Bug fix released — please update to the latest version (v1.3.7). A recent change to CASSIA's networking layer introduced an issue that could cause some users to see errors when running annotations. This is now fixed.
- R / Python users: update to v1.3.7 —
remotes::install_github("ElliotXie/CASSIA/CASSIA_R", force = TRUE, upgrade = "never")orpip install --upgrade cassia- Web users: open cassia.bio in an incognito window (or clear your browser cache) to pick up the new build.
- If you still see an error after updating, please open an issue.
📜 Previous Updates (click to expand)
2025-11-29 🎇 Major update with new features and improvements!
- Python Documentation: Complete Python docs and vignettes now available
- Annotation Boost Improvements: Sidebar navigation, better reports, bug fixes
- Better Scanpy Support: Fixed marker processing, improved R/Python sync
- Symphony Compare Update: Improved comparison module
- Batch Output & Ranking: Updated HTML output for runCASSIA_batch with new ranking method option
- Fuzzy Model Aliases: Easier model selection without remembering exact names
2025-05-05 📊 CASSIA annotation benchmark is now online! The latest update introduces a new benchmarking platform that evaluates how different LLMs perform on single-cell annotation tasks, including accuracy and cost. LLaMA4 Maverick, Gemini 2.5 Flash, and DeepSeekV3 are the top three most balanced options—nearly free! 🔧 A new auto-merge function unifies CASSIA output across different levels, making subclustering much easier. 🐛 Fixed a bug in the annotation boost agent to improve downstream refinement.
2025-04-19 🔄 CASSIA adds a retry mechanism and optimized report storage! The latest update introduces an automatic retry mechanism for failed tasks and optimizes how reports are stored for easier access and management. 🎨 The CASSIA logo has been drawn and added to the project!
2025-04-17 🚀 CASSIA now supports automatic single-cell annotation benchmarking! The latest update introduces a new function that enables fully automated benchmarking of single-cell annotation. Results are evaluated automatically using LLMs, achieving performance on par with human experts. A dedicated benchmark website is coming soon—stay tuned!
🏗️ Installation
# Install dependencies
install.packages("devtools")
install.packages("reticulate")
# Install CASSIA
devtools::install_github("ElliotXie/CASSIA/CASSIA_R")
If you have network issues installing from GitHub, you can install from source:
# Install from downloaded source package
install.packages("path/to/CASSIA_1.3.2.tar.gz", repos = NULL, type = "source")
Download source package: CASSIA_1.3.2.tar.gz
Note: If the environment is not set up correctly the first time, please restart R and run the code below
library(CASSIA)
setup_cassia_env()
🔑 Set Up API Key
It should take about 3 minutes to get your API key.
You only need one API key to use CASSIA. We recommend OpenRouter since it provides access to most models (OpenAI, Anthropic, Google, etc.) through a single API key — no need to sign up for multiple providers.
# For OpenRouter
setLLMApiKey("your_openrouter_api_key", provider = "openrouter", persist = TRUE)
# For OpenAI
setLLMApiKey("your_openai_api_key", provider = "openai", persist = TRUE)
# For Anthropic
setLLMApiKey("your_anthropic_api_key", provider = "anthropic", persist = TRUE)
# For custom OpenAI-compatible APIs (e.g., DeepSeek)
setLLMApiKey("your_deepseek_api_key", provider = "https://api.deepseek.com", persist = TRUE)
# For local LLMs - no API key needed (e.g., Ollama)
setLLMApiKey(provider = "http://localhost:11434/v1", persist = TRUE)
Custom APIs: CASSIA supports any OpenAI-compatible API endpoint. Simply use the base URL as the provider parameter.
Local LLMs: For data privacy and zero API costs, use local LLMs like Ollama or LM Studio. No API key required for localhost URLs.
- API Provider Guides:
🧬 Example Data
CASSIA includes example marker data in two formats:
# Load example data
markers_unprocessed <- loadExampleMarkers(processed = FALSE) # Direct Seurat output
markers_processed <- loadExampleMarkers(processed = TRUE) # Processed format
⚙️ Quick Start
# Core annotation
runCASSIA_batch(
marker = markers_unprocessed, # Marker data from FindAllMarkers
output_name = "cassia_results", # Output file name
tissue = "Large Intestine", # Tissue type
species = "Human", # Species
model = "anthropic/claude-sonnet-4.6", # Model to use
provider = "openrouter", # API provider
max_workers = 4 # Number of parallel workers
)
Want even better results? Use
runCASSIA_pipeline()which adds automatic quality scoring and the AnnotationBoost agent for difficult clusters. See complete documentation for details.
🤖 Supported Models
You can choose any model for annotation and scoring. CASSIA also supports custom providers (e.g., DeepSeek) and local open-source models (e.g., gpt-oss:20b via Ollama).
Some classic models are listed below. OpenRouter supports most popular models — feel free to experiment.
OpenAI
gpt-5.4: Balanced option (Recommended)gpt-4o: Used in the benchmark
OpenRouter
openai/gpt-5.4: Best-performing model via OpenRouter (no identity verification needed, unlike direct OpenAI API) (Recommended)anthropic/claude-sonnet-4.6: Best-performing model via OpenRouter (Recommended)google/gemini-3-flash-preview: One of the best-performing low-cost modelsx-ai/grok-4.20-beta: One of the best-performing low-cost models.
Anthropic
claude-sonnet-4-6: The latest best-performing model (Most recommended)
Other Providers
These models can be used via their own APIs. See Custom API Providers for setup.
deepseek-chat(DeepSeek v3.2): High performance, very affordable. Provider:https://api.deepseek.comglm-5(GLM 5): Fast and cost-effective. Provider:https://api.z.ai/api/paas/v4/kimi-k2.5(Kimi K2.5): Strong reasoning capabilities. Provider:https://api.moonshot.ai/v1
Local LLMs
gpt-oss:20b: Can run locally via Ollama. Good for large bulk analysis with acceptable accuracy. See Local LLMs for setup.
📖 Citation
📖 Read our paper in Nature Communications
Xie, E., Cheng, L., Shireman, J. et al. CASSIA: a multi-agent large language model for automated and interpretable cell annotation. Nat Commun (2025). https://doi.org/10.1038/s41467-025-67084-x
📬 Contact
If you have any questions or need help, feel free to email us. We are always happy to help: xie227@wisc.edu If you find this project helpful, please share it with your friends, and give this repo a star ⭐ Many thanks!