PrePATH: A Toolkit for Preprocessing Whole Slide Images

March 26, 2026 · View on GitHub

PrePath logo

Tip

🚀 Contribute Your Foundation Model! We welcome submissions of new pathology foundation models to our benchmark. 👉 Submit Your Model Here — Help advance the field by adding your model to PrePATH!

PrePATH is a comprehensive preprocessing toolkit for whole slide images (WSI), built upon CLAM and ASlide.

TODO

H0-mini
OpenMidnight
GenBio-PathFM
TITAN (Slide level)

Installation

Prerequisites

Anaconda or Miniconda
openslide-tools (system dependency)

Setup Instructions

The following instructions demonstrate installation for the GPFM model. For other foundation models, please refer to their respective repositories for environment-specific requirements.

git clone https://github.com/birkhoffkiki/PrePATH.git
cd PrePATH
conda create --name gpfm python=3.10
conda activate gpfm
pip install -r requirements/gpfm.txt
cd models/ckpts/
wget https://github.com/birkhoffkiki/GPFM/releases/download/ckpt/GPFM.pth

Notes:

ASlide should be installed as a Python package from GitHub and is included in requirements/gpfm.txt.
Environment configurations for other foundation models should be referenced from their respective repositories.

Usage

⚡Using PrePATH to extract Patch-Level Features

Step 1: Coordinate Extraction

Extract coordinates of foreground patches from whole slide images:

# Configure variables in the script before execution
bash scripts/get_coors/example.sh

Step 2: Feature Extraction

Extract patch-level features using the selected foundation model:

# Refer to the script for detailed configuration options
bash scripts/extract_feature/one_gpu_example.sh

If you have multiple GPUs, you can use the exe.sh script for parallel processing:

bash scripts/extract_feature/exe.sh

Step 3: (Optional) Extract patches and pack them into HDF5 files

This is useful for pretraining or if you meet the Corrupt JPEG data error during feature extraction.
This may happen for kfb or sdpc images due to limited support in multiprocessing.

# Refer to the script for detailed configuration options
bash scripts/crop_image/example_packed2h5.sh

Step 4: (Optional) Extract features from HDF5 packed patches

If you have packed patches into HDF5 files in Step 3, you can extract features from them directly:

# Refer to the script for detailed configuration options
bash scripts/extract_feature/one_gpu_from_h5_example.sh

⚡Extract patches directly without feature extraction (e.g., for pretraining)

Step 1: Coordinate Extraction

Extract coordinates of foreground patches from whole slide images:

# Configure variables in the script before execution
bash scripts/get_coors/example.sh

Step 2: Patch Extraction

Extract patches based on the coordinates:
We strongly recommend packing all patches using the HDF5 method for efficient storage and retrieval.

# Refer to the script for detailed configuration options
bash scripts/crop_image/example_packed2h5.sh

Supported Foundation Models

Note: Each foundation model requires its corresponding Python environment to be properly configured.

Model	Identifier	Reference
ResNet50	`resnet50`	Standard ImageNet pretrained model
GPFM	`gpfm`	GitHub
CTransPath	`ctranspath`	GitHub
PLIP	`plip`	GitHub
CONCH	`conch`	HuggingFace
CONCH-1.5	`conch15`	HuggingFace
UNI	`uni`	HuggingFace
UNI-2	`uni2`	HuggingFace
mSTAR	`mstar`	GitHub
Phikon	`phikon`	HuggingFace
Phikon2	`phikon2`	HuggingFace
Virchow-2	`virchow2`	HuggingFace
Prov-GigaPath	`gigapath`	HuggingFace
CHIEF	`chief`	GitHub
H-Optimus-0	`h-optimus-0`	HuggingFace
H0-mini	`h0-mini`	HuggingFace
H-Optimus-1	`h-optimus-1`	HuggingFace
OpenMidnight	`openmidnight`	HuggingFace
GenBio-PathFM	`genbio-pathfm`	HuggingFace
Lunit	`lunit`	GitHub
Hibou-L	`hibou-l`	GitHub
MUSK	`musk`	HuggingFace
OmiCLIP	`omiclip`	Github
PathoCLIP	`pathoclip`	Github

Supported WSI Formats

PrePATH supports the following whole slide image formats:

KFB (.kfb)
SDPC (.sdpc)
TRON (.tron)
All formats supported by OpenSlide (including .svs, .tiff, .ndpi, .vms, .vmu, .scn, .mrxs, .tif, .bif, and others)