PrePATH: A Toolkit for Preprocessing Whole Slide Images

March 26, 2026 · View on GitHub

PrePath logo


Live Benchmark PathBench-MIL Paper

Tip

🚀 Contribute Your Foundation Model! We welcome submissions of new pathology foundation models to our benchmark. 👉 Submit Your Model Here — Help advance the field by adding your model to PrePATH!


PrePATH is a comprehensive preprocessing toolkit for whole slide images (WSI), built upon CLAM and ASlide.

TODO

  • H0-mini
  • OpenMidnight
  • GenBio-PathFM
  • TITAN (Slide level)

Installation

Prerequisites

  • Anaconda or Miniconda
  • openslide-tools (system dependency)

Setup Instructions

The following instructions demonstrate installation for the GPFM model. For other foundation models, please refer to their respective repositories for environment-specific requirements.

git clone https://github.com/birkhoffkiki/PrePATH.git
cd PrePATH
conda create --name gpfm python=3.10
conda activate gpfm
pip install -r requirements/gpfm.txt
cd models/ckpts/
wget https://github.com/birkhoffkiki/GPFM/releases/download/ckpt/GPFM.pth

Notes:

  • ASlide should be installed as a Python package from GitHub and is included in requirements/gpfm.txt.
  • Environment configurations for other foundation models should be referenced from their respective repositories.

Usage

⚡Using PrePATH to extract Patch-Level Features

Step 1: Coordinate Extraction

Extract coordinates of foreground patches from whole slide images:

# Configure variables in the script before execution
bash scripts/get_coors/example.sh

Step 2: Feature Extraction

Extract patch-level features using the selected foundation model:

# Refer to the script for detailed configuration options
bash scripts/extract_feature/one_gpu_example.sh

If you have multiple GPUs, you can use the exe.sh script for parallel processing:

bash scripts/extract_feature/exe.sh

Step 3: (Optional) Extract patches and pack them into HDF5 files

This is useful for pretraining or if you meet the Corrupt JPEG data error during feature extraction.
This may happen for kfb or sdpc images due to limited support in multiprocessing.

# Refer to the script for detailed configuration options
bash scripts/crop_image/example_packed2h5.sh

Step 4: (Optional) Extract features from HDF5 packed patches

If you have packed patches into HDF5 files in Step 3, you can extract features from them directly:

# Refer to the script for detailed configuration options
bash scripts/extract_feature/one_gpu_from_h5_example.sh

⚡Extract patches directly without feature extraction (e.g., for pretraining)

Step 1: Coordinate Extraction

Extract coordinates of foreground patches from whole slide images:

# Configure variables in the script before execution
bash scripts/get_coors/example.sh

Step 2: Patch Extraction

Extract patches based on the coordinates:
We strongly recommend packing all patches using the HDF5 method for efficient storage and retrieval.

# Refer to the script for detailed configuration options
bash scripts/crop_image/example_packed2h5.sh

Supported Foundation Models

Note: Each foundation model requires its corresponding Python environment to be properly configured.

ModelIdentifierReference
ResNet50resnet50Standard ImageNet pretrained model
GPFMgpfmGitHub
CTransPathctranspathGitHub
PLIPplipGitHub
CONCHconchHuggingFace
CONCH-1.5conch15HuggingFace
UNIuniHuggingFace
UNI-2uni2HuggingFace
mSTARmstarGitHub
PhikonphikonHuggingFace
Phikon2phikon2HuggingFace
Virchow-2virchow2HuggingFace
Prov-GigaPathgigapathHuggingFace
CHIEFchiefGitHub
H-Optimus-0h-optimus-0HuggingFace
H0-minih0-miniHuggingFace
H-Optimus-1h-optimus-1HuggingFace
OpenMidnightopenmidnightHuggingFace
GenBio-PathFMgenbio-pathfmHuggingFace
LunitlunitGitHub
Hibou-Lhibou-lGitHub
MUSKmuskHuggingFace
OmiCLIPomiclipGithub
PathoCLIPpathoclipGithub

Supported WSI Formats

PrePATH supports the following whole slide image formats:

  • KFB (.kfb)
  • SDPC (.sdpc)
  • TRON (.tron)
  • All formats supported by OpenSlide (including .svs, .tiff, .ndpi, .vms, .vmu, .scn, .mrxs, .tif, .bif, and others)