CellAgent: LLM-Driven Multi-Agent Framework for Automated scRNA-Seq Data Analysis

June 11, 2026 · View on GitHub

An intelligent multi-agent framework powered by Large Language Models (LLMs) to automate single-cell RNA sequencing (scRNA-seq) data analysis tasks. The framework decomposes complex analysis workflows into manageable steps through three primary agent roles: Planner, Executor, and Evaluator.

Overview

CellAgent leverages the capabilities of LLMs to understand biological analysis requirements and automatically orchestrate the execution of appropriate data processing and analysis steps. It provides an interactive, iterative approach to scRNA-seq analysis with built-in quality evaluation and self-optimization mechanisms.

Decomposes user-provided analysis tasks into structured, executable steps
Understands the biological context and data characteristics
Generates detailed task plans in JSON format
Maps high-level biological requirements to specific analytical procedures

2. Executor (`src/executor.py`)

Executes each planned step with precision
Comprises two sub-components:
- Tool Selector (src/tool_selector.py): Intelligently selects appropriate analysis tools based on task requirements
- Code Programmer (src/code_programmer.py): Generates executable Python code for bioinformatics tasks
Manages iterative optimization with automatic retry mechanisms

3. Evaluator (`src/evaluator.py`)

Assesses the quality and correctness of generated code execution results
Provides expert-level evaluation based on biological principles
Generates improvement suggestions for failed or suboptimal analyses
Determines whether results meet user requirements

Supporting Components

Global Memory (src/memory.py): Maintains analysis context and code history across all steps
Code Sandbox (src/code_sandbox.py): Executes generated code in a Jupyter Notebook environment with safety isolation
Tool Registry (src/tools/tool_registry.py): Manages available bioinformatics tools and their documentation

Features

✨ Key Capabilities

Automated Task Planning: Decomposes complex scRNA-seq analysis into logical steps
Intelligent Tool Selection: Automatically chooses appropriate tools for each analytical task
Automatic Code Generation: Generates Python code tailored to specific analysis requirements
Iterative Self-Optimization: Automatically improves code execution through multiple attempts (default: 2, up to 3 for batch effect correction)
Quality Evaluation: Expert-level assessment of results with improvement recommendations
Jupyter Integration: All analysis code and results are organized in Jupyter Notebooks
Context Memory: Maintains global analysis context to ensure coherent multi-step workflows
Error Handling: Graceful error management with automatic recovery mechanisms

Installation

Prerequisites

Python 3.8+
Ollama (for local LLM) or OpenAI API key (for GPT-4)
Jupyter Notebook
Bioinformatics analysis libraries (scanpy, etc.)

Dependencies

pip install langchain langchain-community langchain-openai
pip install scanpy pandas numpy scipy scikit-learn
pip install jupyter notebook nbconvert

Setup

Clone the repository:

git clone https://github.com/liu-shiqiang/CellAgent.git
cd CellAgent

Install required packages:

pip install -r requirements.txt

Set up LLM configuration:

Option A: Local LLM (Ollama)

# Install Ollama from https://ollama.ai
# Pull the required model
ollama pull llama3.1
# Start Ollama server
ollama serve

Option B: OpenAI API Update the LLM initialization in main.py with your API key

Usage

Quick Start

python main.py

The program will:

Prompt you to enter your scRNA-seq analysis task
Request the path to your scRNA-seq data file (H5AD format)
Automatically:
- Load and analyze your data
- Generate an analysis plan
- Execute each step with automatic optimization
- Save results to a Jupyter Notebook

Example Workflow

# Step 1: User Input
# Task: "Perform quality control and cell type annotation on scRNA-seq data"
# Data path: "/path/to/data.h5ad"

# Step 2: System automatically:
# - Plans: [QC step, Normalization, Dimensionality reduction, Clustering, Annotation, ...]
# - Executes each step with evaluation and optimization
# - Saves all code and visualizations to analysis.ipynb

Data Format

Supported Input Format

H5AD files (.h5ad): AnnData objects compatible with scanpy
Data should contain gene expression matrix and relevant metadata

Output

Jupyter Notebook (examples/notebooks/analysis.ipynb): Complete analysis workflow with code, visualizations, and results

Configuration

LLM Settings

In main.py, modify LLM configuration:

# Local LLM (default)
llm = Ollama(model='llama3.1', base_url='http://localhost:11434')

# Or use OpenAI
# llm = ChatOpenAI(model_name='gpt-4', temperature=0)

Code Sandbox Configuration

Update the notebook path in main.py:

code_sandbox = CodeSandbox(notebook_path='/your/path/to/analysis.ipynb')

Retry Attempts

Configure maximum retry attempts per step:

max_attempts = 2  # Default
# For batch effect correction: automatically set to 3

Project Structure

CellAgent/
├── main.py                      # Entry point
├── src/
│   ├── __init__.py
│   ├── planner.py               # Task decomposition
│   ├── executor.py              # Step execution orchestration
│   ├── evaluator.py             # Result quality assessment
│   ├── tool_selector.py         # Tool selection logic
│   ├── code_programmer.py       # Code generation
│   ├── code_sandbox.py          # Jupyter execution environment
│   ├── memory.py                # Context management
│   ├── tools/
│   │   └── tool_registry.py     # Available tools registry
│   └── utils/
│       └── json_utils.py        # JSON parsing utilities
└── examples/
    └── notebooks/
        └── analysis.ipynb       # Generated analysis results

Workflow Diagram

User Input (Task + Data)
    ↓
Planner (Decompose into steps)
    ↓
For Each Step:
    ├─ Tool Selector (Choose tools)
    ├─ Code Programmer (Generate code)
    ├─ Code Sandbox (Execute code)
    ├─ Evaluator (Assess quality)
    └─ Self-Optimize if needed ↻
    ↓
Output: Jupyter Notebook with Complete Analysis

Performance Considerations

First attempt success rate: Depends on LLM quality and task complexity
Typical execution time: 5-30 minutes per analysis (depends on data size and steps)
Memory requirements: 8GB+ RAM recommended for large datasets
GPU acceleration: Optional but recommended for faster execution

Supported Analysis Tasks

Quality control and filtering
Normalization and batch effect correction
Dimensionality reduction (PCA, UMAP, t-SNE)
Clustering and cell type annotation
Differential expression analysis
Gene ontology enrichment
Trajectory inference
Cell-cell interaction analysis

For Ollama: Ensure the service is running (ollama serve)
Check the base URL is correct

Issue: Code execution fails in sandbox

Solution:

Check the generated code in the Notebook
Verify data format and compatibility
Increase max_attempts for problematic steps

Advanced Usage

Custom Tool Registry

Add custom analysis tools to src/tools/tool_registry.py:

class ToolRegistry:
    def get_available_tools(self):
        return {
            "custom_tool": {
                "name": "Custom Analysis Tool",
                "documentation": "Detailed documentation..."
            }
        }

Memory Management

Access global memory during execution:

global_memory.add_code(code)
previous_codes = global_memory.get_all_codes()

Contributing

Contributions are welcome! Areas for improvement:

Additional bioinformatics tools integration
Enhanced evaluation metrics
Performance optimization
Better error handling and recovery

Citation

If you use CellAgent in your research, please cite:

@software{cellagent2024,
  author = {Liu, Shiqiang},
  title = {CellAgent: LLM-Driven Multi-Agent Framework for Automated scRNA-Seq Data Analysis},
  year = {2024},
  url = {https://github.com/liu-shiqiang/CellAgent}
}

License

This project is open-source and available under the MIT License.

Contact & Support

GitHub Issues: https://github.com/liu-shiqiang/CellAgent/issues
Author: Liu Shiqiang
Email: (contact information)

Acknowledgments

Built with:

LangChain - LLM orchestration
Scanpy - scRNA-seq analysis
Ollama - Local LLM execution
Jupyter - Interactive notebooks

Roadmap

Web UI for easier task input
Support for additional data formats (Zarr, Parquet)
Cloud deployment templates
Enhanced visualization library
Multi-dataset analysis support
Real-time progress tracking
Result export to multiple formats (HTML, PDF, etc.)

Last Updated: October 2024
Version: 1.0.0