SPIDER: A Multi-Organ Supervised Pathology Dataset and Baseline Models

April 7, 2025 ยท View on GitHub

Overview

SPIDER (Supervised Pathology Image-DEscription Repository) is a large, high-quality, and diverse patch-level dataset designed to advance AI-driven computational pathology. It provides multi-organ coverage, expert-annotated labels, and strong baseline models to support research and development in digital pathology.

This repository serves as a central hub for accessing the SPIDER datasets, pre-trained models, and related resources.


๐Ÿ“„ Paper

For a detailed description of SPIDER, methodology, and benchmark results, refer to our research paper:

๐Ÿ“„ SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models
View on arXiv


Resources

๐Ÿ“‚ Datasets

SPIDER consists of four organ-specific datasets. Available for download from Hugging Face Hub ๐Ÿค—:

Each dataset contains:

  • 224ร—224 central patches with expert-verified class labels
  • 24 surrounding context patches forming a 1120ร—1120 composite region
  • 20X magnification for high-detail analysis
  • Train-test splits ensuring robust benchmarking

๐Ÿ“Œ See individual dataset pages for more details.

๐Ÿค– Pretrained Models

Baseline models trained on the SPIDER datasets using the Hibou-L foundation model with an attention-based classification head. Available for download from Hugging Face Hub ๐Ÿค—:

Each model supports:

  • Patch-level classification with multi-class labels
  • Improved accuracy using surrounding context patches
  • Easy deployment for pathology AI applications

๐Ÿ“Œ See individual model pages for inference instructions.


๐Ÿ”ง Getting Started

๐Ÿ›  Using the Dataset

Download any SPIDER dataset using huggingface_hub:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="histai/SPIDER-colorectal", repo_type="dataset", local_dir="./spider_colorectal")

Or clone directly using Git:

git lfs install
git clone https://huggingface.co/datasets/histai/SPIDER-colorectal

Extract dataset files:

cat spider-colorectal.tar.* | tar -xvf -

๐Ÿค– Using the Model

Load a pretrained model for inference:

from transformers import AutoModel, AutoProcessor
from PIL import Image

model = AutoModel.from_pretrained("histai/SPIDER-colorectal-model", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("histai/SPIDER-colorectal-model", trust_remote_code=True)

image = Image.open("path_to_image.png")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
print(outputs.predicted_class_names)

๐Ÿ“ˆ Benchmark Results

OrganAccuracyPrecisionF1 Score
Skin0.9400.9350.937
Colorectal0.9140.9170.915
Thorax0.9620.9580.960
Breast0.9020.8960.897

๐Ÿ”— More Information


๐Ÿ“œ License

This project is licensed under CC BY-NC 4.0. The dataset and models are available for research use only.


๐Ÿ“ง Contact

Authors: Dmitry Nechaev, Alexey Pchelnikov, Ekaterina Ivanova
๐Ÿ“ฉ Emails: dmitry@hist.ai, alex@hist.ai, kate@hist.ai


๐Ÿ“– Citation

If you use SPIDER in your research, please cite:

@misc{nechaev2025spidercomprehensivemultiorgansupervised,
      title={SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models}, 
      author={Dmitry Nechaev and Alexey Pchelnikov and Ekaterina Ivanova},
      year={2025},
      eprint={2503.02876},
      archivePrefix={arXiv},
      primaryClass={eess.IV},
      url={https://arxiv.org/abs/2503.02876}, 
}