CellAgent: LLM-Driven Multi-Agent Framework for Automated scRNA-Seq Data Analysis
June 11, 2026 · View on GitHub
An intelligent multi-agent framework powered by Large Language Models (LLMs) to automate single-cell RNA sequencing (scRNA-seq) data analysis tasks. The framework decomposes complex analysis workflows into manageable steps through three primary agent roles: Planner, Executor, and Evaluator.
Overview
CellAgent leverages the capabilities of LLMs to understand biological analysis requirements and automatically orchestrate the execution of appropriate data processing and analysis steps. It provides an interactive, iterative approach to scRNA-seq analysis with built-in quality evaluation and self-optimization mechanisms.
Architecture
Core Components
The framework consists of three main agent roles:
1. Planner (src/planner.py)
- Decomposes user-provided analysis tasks into structured, executable steps
- Understands the biological context and data characteristics
- Generates detailed task plans in JSON format
- Maps high-level biological requirements to specific analytical procedures
2. Executor (src/executor.py)
- Executes each planned step with precision
- Comprises two sub-components:
- Tool Selector (
src/tool_selector.py): Intelligently selects appropriate analysis tools based on task requirements - Code Programmer (
src/code_programmer.py): Generates executable Python code for bioinformatics tasks
- Tool Selector (
- Manages iterative optimization with automatic retry mechanisms
3. Evaluator (src/evaluator.py)
- Assesses the quality and correctness of generated code execution results
- Provides expert-level evaluation based on biological principles
- Generates improvement suggestions for failed or suboptimal analyses
- Determines whether results meet user requirements
Supporting Components
- Global Memory (
src/memory.py): Maintains analysis context and code history across all steps - Code Sandbox (
src/code_sandbox.py): Executes generated code in a Jupyter Notebook environment with safety isolation - Tool Registry (
src/tools/tool_registry.py): Manages available bioinformatics tools and their documentation
Features
✨ Key Capabilities
- Automated Task Planning: Decomposes complex scRNA-seq analysis into logical steps
- Intelligent Tool Selection: Automatically chooses appropriate tools for each analytical task
- Automatic Code Generation: Generates Python code tailored to specific analysis requirements
- Iterative Self-Optimization: Automatically improves code execution through multiple attempts (default: 2, up to 3 for batch effect correction)
- Quality Evaluation: Expert-level assessment of results with improvement recommendations
- Jupyter Integration: All analysis code and results are organized in Jupyter Notebooks
- Context Memory: Maintains global analysis context to ensure coherent multi-step workflows
- Error Handling: Graceful error management with automatic recovery mechanisms
Installation
Prerequisites
- Python 3.8+
- Ollama (for local LLM) or OpenAI API key (for GPT-4)
- Jupyter Notebook
- Bioinformatics analysis libraries (scanpy, etc.)
Dependencies
pip install langchain langchain-community langchain-openai
pip install scanpy pandas numpy scipy scikit-learn
pip install jupyter notebook nbconvert
Setup
- Clone the repository:
git clone https://github.com/liu-shiqiang/CellAgent.git
cd CellAgent
- Install required packages:
pip install -r requirements.txt
- Set up LLM configuration:
Option A: Local LLM (Ollama)
# Install Ollama from https://ollama.ai
# Pull the required model
ollama pull llama3.1
# Start Ollama server
ollama serve
Option B: OpenAI API
Update the LLM initialization in main.py with your API key
Usage
Quick Start
python main.py
The program will:
- Prompt you to enter your scRNA-seq analysis task
- Request the path to your scRNA-seq data file (H5AD format)
- Automatically:
- Load and analyze your data
- Generate an analysis plan
- Execute each step with automatic optimization
- Save results to a Jupyter Notebook
Example Workflow
# Step 1: User Input
# Task: "Perform quality control and cell type annotation on scRNA-seq data"
# Data path: "/path/to/data.h5ad"
# Step 2: System automatically:
# - Plans: [QC step, Normalization, Dimensionality reduction, Clustering, Annotation, ...]
# - Executes each step with evaluation and optimization
# - Saves all code and visualizations to analysis.ipynb
Data Format
Supported Input Format
- H5AD files (.h5ad): AnnData objects compatible with scanpy
- Data should contain gene expression matrix and relevant metadata
Output
- Jupyter Notebook (
examples/notebooks/analysis.ipynb): Complete analysis workflow with code, visualizations, and results
Configuration
LLM Settings
In main.py, modify LLM configuration:
# Local LLM (default)
llm = Ollama(model='llama3.1', base_url='http://localhost:11434')
# Or use OpenAI
# llm = ChatOpenAI(model_name='gpt-4', temperature=0)
Code Sandbox Configuration
Update the notebook path in main.py:
code_sandbox = CodeSandbox(notebook_path='/your/path/to/analysis.ipynb')
Retry Attempts
Configure maximum retry attempts per step:
max_attempts = 2 # Default
# For batch effect correction: automatically set to 3
Project Structure
CellAgent/
├── main.py # Entry point
├── src/
│ ├── __init__.py
│ ├── planner.py # Task decomposition
│ ├── executor.py # Step execution orchestration
│ ├── evaluator.py # Result quality assessment
│ ├── tool_selector.py # Tool selection logic
│ ├── code_programmer.py # Code generation
│ ├── code_sandbox.py # Jupyter execution environment
│ ├── memory.py # Context management
│ ├── tools/
│ │ └── tool_registry.py # Available tools registry
│ └── utils/
│ └── json_utils.py # JSON parsing utilities
└── examples/
└── notebooks/
└── analysis.ipynb # Generated analysis results
Workflow Diagram
User Input (Task + Data)
↓
Planner (Decompose into steps)
↓
For Each Step:
├─ Tool Selector (Choose tools)
├─ Code Programmer (Generate code)
├─ Code Sandbox (Execute code)
├─ Evaluator (Assess quality)
└─ Self-Optimize if needed ↻
↓
Output: Jupyter Notebook with Complete Analysis
Performance Considerations
- First attempt success rate: Depends on LLM quality and task complexity
- Typical execution time: 5-30 minutes per analysis (depends on data size and steps)
- Memory requirements: 8GB+ RAM recommended for large datasets
- GPU acceleration: Optional but recommended for faster execution
Supported Analysis Tasks
- Quality control and filtering
- Normalization and batch effect correction
- Dimensionality reduction (PCA, UMAP, t-SNE)
- Clustering and cell type annotation
- Differential expression analysis
- Gene ontology enrichment
- Trajectory inference
- Cell-cell interaction analysis
Troubleshooting
Issue: Data file not found
Solution: Verify the exact file path. Use absolute paths or ensure the relative path is correct.
Issue: LLM connection timeout
Solution:
- For Ollama: Ensure the service is running (
ollama serve) - Check the base URL is correct
Issue: Code execution fails in sandbox
Solution:
- Check the generated code in the Notebook
- Verify data format and compatibility
- Increase max_attempts for problematic steps
Advanced Usage
Custom Tool Registry
Add custom analysis tools to src/tools/tool_registry.py:
class ToolRegistry:
def get_available_tools(self):
return {
"custom_tool": {
"name": "Custom Analysis Tool",
"documentation": "Detailed documentation..."
}
}
Memory Management
Access global memory during execution:
global_memory.add_code(code)
previous_codes = global_memory.get_all_codes()
Contributing
Contributions are welcome! Areas for improvement:
- Additional bioinformatics tools integration
- Enhanced evaluation metrics
- Performance optimization
- Better error handling and recovery
Citation
If you use CellAgent in your research, please cite:
@software{cellagent2024,
author = {Liu, Shiqiang},
title = {CellAgent: LLM-Driven Multi-Agent Framework for Automated scRNA-Seq Data Analysis},
year = {2024},
url = {https://github.com/liu-shiqiang/CellAgent}
}
License
This project is open-source and available under the MIT License.
Contact & Support
- GitHub Issues: https://github.com/liu-shiqiang/CellAgent/issues
- Author: Liu Shiqiang
- Email: (contact information)
Acknowledgments
Built with:
- LangChain - LLM orchestration
- Scanpy - scRNA-seq analysis
- Ollama - Local LLM execution
- Jupyter - Interactive notebooks
Roadmap
- Web UI for easier task input
- Support for additional data formats (Zarr, Parquet)
- Cloud deployment templates
- Enhanced visualization library
- Multi-dataset analysis support
- Real-time progress tracking
- Result export to multiple formats (HTML, PDF, etc.)
Last Updated: October 2024
Version: 1.0.0