SPIDER: A Multi-Organ Supervised Pathology Dataset and Baseline Models

April 7, 2025 · View on GitHub

Overview

SPIDER (Supervised Pathology Image-DEscription Repository) is a large, high-quality, and diverse patch-level dataset designed to advance AI-driven computational pathology. It provides multi-organ coverage, expert-annotated labels, and strong baseline models to support research and development in digital pathology.

This repository serves as a central hub for accessing the SPIDER datasets, pre-trained models, and related resources.

📄 Paper

For a detailed description of SPIDER, methodology, and benchmark results, refer to our research paper:

📄 SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models
View on arXiv

Resources

📂 Datasets

SPIDER consists of four organ-specific datasets. Available for download from Hugging Face Hub 🤗:

Each dataset contains:

224×224 central patches with expert-verified class labels
24 surrounding context patches forming a 1120×1120 composite region
20X magnification for high-detail analysis
Train-test splits ensuring robust benchmarking

📌 See individual dataset pages for more details.

🤖 Pretrained Models

Baseline models trained on the SPIDER datasets using the Hibou-L foundation model with an attention-based classification head. Available for download from Hugging Face Hub 🤗:

Each model supports:

Patch-level classification with multi-class labels
Improved accuracy using surrounding context patches
Easy deployment for pathology AI applications

📌 See individual model pages for inference instructions.

🔧 Getting Started

🛠 Using the Dataset

Download any SPIDER dataset using huggingface_hub:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="histai/SPIDER-colorectal", repo_type="dataset", local_dir="./spider_colorectal")

Or clone directly using Git:

git lfs install
git clone https://huggingface.co/datasets/histai/SPIDER-colorectal

Extract dataset files:

cat spider-colorectal.tar.* | tar -xvf -

🤖 Using the Model

Load a pretrained model for inference:

from transformers import AutoModel, AutoProcessor
from PIL import Image

model = AutoModel.from_pretrained("histai/SPIDER-colorectal-model", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("histai/SPIDER-colorectal-model", trust_remote_code=True)

image = Image.open("path_to_image.png")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
print(outputs.predicted_class_names)

📈 Benchmark Results

Organ	Accuracy	Precision	F1 Score
Skin	0.940	0.935	0.937
Colorectal	0.914	0.917	0.915
Thorax	0.962	0.958	0.960
Breast	0.902	0.896	0.897

@misc{nechaev2025spidercomprehensivemultiorgansupervised,
      title={SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models}, 
      author={Dmitry Nechaev and Alexey Pchelnikov and Ekaterina Ivanova},
      year={2025},
      eprint={2503.02876},
      archivePrefix={arXiv},
      primaryClass={eess.IV},
      url={https://arxiv.org/abs/2503.02876}, 
}