π± Trident
May 5, 2026 Β· View on GitHub
arXiv | Blog | Cite | Documentation | License
Trident is a toolkit for large-scale whole-slide image processing. This project was developed by the Mahmood Lab at Harvard Medical School and Brigham and Women's Hospital. This work was funded by NIH NIGMS R35GM138216.
Note
Contributions are welcome! Please report any issues. You may also contribute by opening a pull request.
Key Features:
- End-to-end pipeline: tissue segmentation β patch coordinates β patch / slide embeddings, in one command (
--task all) or stage-by-stage. - 22+ patch encoders: UNI, CONCHv1.5, Virchow, Prov-GigaPath, H-Optimus-0, etc.
- Slide encoders: Titan, GigaPath, PRISM, CHIEF, Madeleine, Feather.
- Tissue segmentation: HEST, GrandQC, or Otsu for CPU-only runs. Optional
--remove_artifacts/--remove_penmarksclean-up pass. - Multiple WSI readers: OpenSlide, CuCIM, plain images (
.png,.jpeg), SDPC, OME-Zarr (.zarr), Zeiss CZI (.czi). Or convert to pyramidal TIFF withtrident convert. - Multi-GPU:
--gpus 0 1 2 3distributes pending slides across GPUs. - Smart resume: outputs are tracked per-slide; re-running on the same
--job_dirskips already-completed work..lockfiles protect in-flight tasks; stale ones are cleaned safely with--clear_dead_locks. - WSI cache pipeline for slow / network storage:
--wsi_cache /local/ssd --cache_batch_size 32stages slides locally via a producer/consumer pipeline. - Run reports: every run writes
summary.md(human-readable),runs/<id>.json(manifest), andwsi_states/<slide>.json(per-slide tasks, attempts, errors, resume info).
π¨ 1. Installation:
- Create an environment (Python 3.10 or 3.11):
conda create -n "trident" python=3.10, and activate itconda activate trident. - Cloning:
git clone https://github.com/mahmoodlab/trident.git && cd trident. - Local installation:
pip install -e ..- This installs the shared model stack (
transformers,timm,safetensors, etc.).
- This installs the shared model stack (
Optional install profiles:
pip install -e ".[patch-encoders]"for patch embedding-related extras (e.g. CONCH, MUSK, CTransPath / CHIEF).pip install -e ".[slide-encoders]"for slide embedding-related extras (e.g. PRISM, GigaPath, Madeleine).pip install -e ".[omezarr]"for OME Zarr WSI reader support (OME-NGFF / OME-Zarr).pip install -e ".[czi]"for Zeiss CZI WSI reader support (pylibCZIrw).pip install -e ".[convert]"for slide conversion to tiff.pip install -e ".[full]"to install all pip-installable optional dependencies.
Run checks before launching jobs:
trident-doctor --profile basetrident-doctor --profile patch-encoders --check-gatedtrident-doctor --profile slide-encoderstrident-doctor --profile converttrident-doctor --profile full --check-gated
Note
Some models still require manual setup (e.g., local CHIEF repository path in trident/slide_encoder_models/local_ckpts.json) or HuggingFace gated access approvals.
π¨ 2. Running Trident:
Already familiar with WSI processing? Perform segmentation, patching, and UNI feature extraction from a directory of WSIs with:
python run_batch_of_slides.py --task all --wsi_dir ./wsis --job_dir ./trident_processed --patch_encoder uni_v1 --mag 20 --patch_size 256
Feeling cautious?
Run this command to perform all processing steps for a single slide:
python run_single_slide.py --slide_path ./wsis/xxxx.svs --job_dir ./trident_processed --patch_encoder uni_v1 --mag 20 --patch_size 256
Convert images/WSIs to pyramidal TIFF:
trident convert --input_dir ./wsis --mpp_csv ./wsis/to_process.csv --job_dir ./pyramidal_tiff --downscale_by 1 --num_workers 1
--mpp_csv is required and must contain wsi,mpp columns. Only files listed in the CSV are converted.
If embedded MPP metadata is detected in a slide, Trident compares it to the CSV value and logs mismatches.
Or follow step-by-step instructions:
Step 1: Tissue Segmentation: Segments tissue vs. background from a dir of WSIs
- Command:
python run_batch_of_slides.py --task seg --wsi_dir ./wsis --job_dir ./trident_processed --gpus 0 --segmenter hest--task seg: Specifies that you want to do tissue segmentation.--wsi_dir ./wsis: Path to dir with your WSIs.--job_dir ./trident_processed: Output dir for processed results.--gpus 0: Use GPU index 0. Pass multiple IDs (e.g.--gpus 0 1) to shard across GPUs, or-1to force CPU.
--segmenter: Segmentation model. Defaults tohest. Usegrandqc(Citation necessary, Non-commercial use, Original repository) for fast H&E segmentation orotsufor a classical image-processing-only fallback. Add the option--remove_artifactsfor additional artifact clean up.- Outputs:
- WSI thumbnails in
./trident_processed/thumbnails. - WSI thumbnails with tissue contours in
./trident_processed/contours. - GeoJSON files containing tissue contours in
./trident_processed/contours_geojson. These can be opened in QuPath for editing/quality control, if necessary.
- WSI thumbnails in
Step 2: Tissue Patching: Extracts patches from segmented tissue regions at a specific magnification.
- Command:
python run_batch_of_slides.py --task coords --wsi_dir ./wsis --job_dir ./trident_processed --mag 20 --patch_size 256 --overlap 0--task coords: Specifies that you want to do patching.--wsi_dir wsis: Path to the dir with your WSIs.--job_dir ./trident_processed: Output dir for processed results.--mag 20: Extracts patches at 20x magnification.--patch_size 256: Each patch is 256x256 pixels.--overlap 0: Patches overlap by 0 pixels, always an absolute number in pixels, e.g.,--overlap 128for 50% overlap for 256x256 patches.
- Outputs:
- Patch coordinates as h5 files in
./trident_processed/20x_256px_0px_overlap/patches. - WSI thumbnails annotated with patch borders in
./trident_processed/20x_256px_0px_overlap/visualization.
- Patch coordinates as h5 files in
Step 3a: Patch Feature Extraction: Extracts features from tissue patches using a specified encoder
- Command:
python run_batch_of_slides.py --task feat --wsi_dir ./wsis --job_dir ./trident_processed --patch_encoder uni_v1 --mag 20 --patch_size 256--task feat: Specifies that you want to do feature extraction.--wsi_dir wsis: Path to the dir with your WSIs.--job_dir ./trident_processed: Output dir for processed results.--patch_encoder uni_v1: Uses theUNIpatch encoder. See below for list of supported models.--mag 20: Features are extracted from patches at 20x magnification.--patch_size 256: Patches are 256x256 pixels in size.
- Outputs:
- Features are saved as h5 files in
./trident_processed/20x_256px_0px_overlap/features_uni_v1. (Shape:(n_patches, feature_dim))
- Features are saved as h5 files in
Trident supports 24 patch encoders, loaded via a patch encoder_factory. Models requiring specific installations will return error messages with additional instructions. Gated models on HuggingFace require access requests.
| Patch Encoder | Embedding Dim | Args | Link |
|---|---|---|---|
| UNI | 1024 | --patch_encoder uni_v1 --patch_size 256 --mag 20 | MahmoodLab/UNI |
| UNI2-h | 1536 | --patch_encoder uni_v2 --patch_size 256 --mag 20 | MahmoodLab/UNI2-h |
| CONCH | 512 | --patch_encoder conch_v1 --patch_size 512 --mag 20 | MahmoodLab/CONCH |
| CONCHv1.5 | 768 | --patch_encoder conch_v15 --patch_size 512 --mag 20 | MahmoodLab/conchv1_5 |
| Virchow | 2560 | --patch_encoder virchow --patch_size 224 --mag 20 | paige-ai/Virchow |
| Virchow2 | 2560 | --patch_encoder virchow2 --patch_size 224 --mag 20 | paige-ai/Virchow2 |
| Phikon | 768 | --patch_encoder phikon --patch_size 224 --mag 20 | owkin/phikon |
| Phikon-v2 | 1024 | --patch_encoder phikon_v2 --patch_size 224 --mag 20 | owkin/phikon-v2 |
| KEEP | 768 | --patch_encoder keep --patch_size 256 --mag 20 | Astaxanthin/KEEP |
| Prov-Gigapath | 1536 | --patch_encoder gigapath --patch_size 256 --mag 20 | prov-gigapath |
| H-Optimus-0 | 1536 | --patch_encoder hoptimus0 --patch_size 224 --mag 20 | bioptimus/H-optimus-0 |
| H-Optimus-1 | 1536 | --patch_encoder hoptimus1 --patch_size 224 --mag 20 | bioptimus/H-optimus-1 |
| H0-mini | 768/1536 | --patch_encoder h0-mini --patch_size 224 --mag 20 | bioptimus/H0-mini |
| MUSK | 1024 | --patch_encoder musk --patch_size 384 --mag 20 | xiangjx/musk |
| Midnight-12k | 3072 | --patch_encoder midnight12k --patch_size 224 --mag 20 | kaiko-ai/midnight |
| OpenMidnight | 1536 | --patch_encoder openmidnight --patch_size 224 --mag 20 | SophontAI/OpenMidnight |
| GPFM | 1024 | --patch_encoder gpfm --patch_size 224 --mag 20 | majiabo/GPFM |
| GenBio-PathFM | 4608 | --patch_encoder genbio-pathfm --patch_size 224 --mag 20 | genbio-ai/genbio-pathfm |
| Kaiko | 384/768/1024 | --patch_encoder {kaiko-vits8, kaiko-vits16, kaiko-vitb8, kaiko-vitb16, kaiko-vitl14} --patch_size 256 --mag 20 | 1aurent/kaikoai-models-66636c99d8e1e34bc6dcf795 |
| Lunit | 384 | --patch_encoder lunit-vits8 --patch_size 224 --mag 20 | 1aurent/vit_small_patch8_224.lunit_dino |
| Hibou | 1024 | --patch_encoder hibou_l --patch_size 224 --mag 20 | histai/hibou-L |
| CTransPath-CHIEF | 768 | --patch_encoder ctranspath --patch_size 256 --mag 10 | β |
| ResNet50 | 1024 | --patch_encoder resnet50 --patch_size 256 --mag 20 | β |
Step 3b: Slide Feature Extraction: Extracts slide embeddings using a slide encoder. Will also automatically extract the right patch embeddings.
- Command:
python run_batch_of_slides.py --task feat --wsi_dir ./wsis --job_dir ./trident_processed --slide_encoder titan --mag 20 --patch_size 512--task feat: Specifies that you want to do feature extraction.--wsi_dir wsis: Path to the dir containing WSIs.--job_dir ./trident_processed: Output dir for processed results.--slide_encoder titan: Uses theTitanslide encoder. See below for supported models.--mag 20: Features are extracted from patches at 20x magnification.--patch_size 512: Patches are 512x512 pixels in size.
- Outputs:
- Features are saved as h5 files in
./trident_processed/20x_512px_0px_overlap/slide_features_titan. (Shape:(feature_dim))
- Features are saved as h5 files in
Trident supports 5 slide encoders, loaded via a slide-level encoder_factory. Models requiring specific installations will return error messages with additional instructions. Gated models on HuggingFace require access requests.
| Slide Encoder | Patch Encoder | Args | Link |
|---|---|---|---|
| Threads | conch_v15 | --slide_encoder threads --patch_size 512 --mag 20 | (Coming Soon!) |
| Titan | conch_v15 | --slide_encoder titan --patch_size 512 --mag 20 | MahmoodLab/TITAN |
| PRISM | virchow | --slide_encoder prism --patch_size 224 --mag 20 | paige-ai/Prism |
| CHIEF | ctranspath | --slide_encoder chief --patch_size 256 --mag 10 | CHIEF |
| GigaPath | gigapath | --slide_encoder gigapath --patch_size 256 --mag 20 | prov-gigapath |
| Madeleine | conch_v1 | --slide_encoder madeleine --patch_size 256 --mag 10 | MahmoodLab/madeleine |
| Feather | conch_v15 | --slide_encoder feather --patch_size 512 --mag 20 | MahmoodLab/FEATHER |
Note
If your task includes multiple slides per patient, you can generate patient-level embeddings by: (1) processing each slide independently and taking their average slide embedding (late fusion) or (2) pooling all patches together and processing that as a single "pseudo-slide" (early fusion). For an implementation of both fusion strategies, please check out our sister repository Patho-Bench.
Please see our tutorials for more support as well as a detailed readme for additional features.
π FAQ
-
Q: How do I extract patch embeddings from legacy patch coordinates extracted with CLAM?
- A:
python run_batch_of_slides.py --task feat --wsi_dir ..wsis --job_dir legacy_dir --patch_encoder uni_v1 --mag 20 --patch_size 256 --coords_dir extracted_mag20x_patch256_fp/
- A:
-
Q: How do I keep patches corresponding to holes in the tissue?
- A: In
run_batch_of_slides, this behavior is default. Set--remove_holesto exclude patches on top of holes.
- A: In
-
Q: I see weird messages when building models using timm. What is happening?
- A: Make sure
timm==0.9.16is installed.timm==1.X.Xcreates issues with most models.
- A: Make sure
-
Q: Whatβs the recommended way to run Trident from another project?
- A: Use the CLI (recommended for reproducibility). Install Trident, then call:
trident single -- --slide_path ./wsis/example.svs --job_dir ./job --patch_encoder uni_v1 --mag 20 --patch_size 256
trident batch -- --task all --wsi_dir ./wsis --job_dir ./job --patch_encoder uni_v1 --mag 20 --patch_size 256
-
If you need to call Trident from Python, just use the public API (
Processor,load_wsi). -
Q: I am not satisfied with the tissue vs background segmentation. What can I do?
- A: Trident uses GeoJSON to store and load segmentations. This format is natively supported by QuPath. You can load the Trident segmentation into QuPath, modify it using QuPath's annotation tools, and save the updated segmentation back to GeoJSON.
- A: You can try another segmentation model by specifying
--segmenter grandqc(Citation necessary, Non-commercial use, Original repository) or--segmenter otsu.
-
Q: I want to process a custom list of WSIs. Can I do it? Also, most of my WSIs don't have the micron per pixel (mpp) stored. Can I pass it?
- A: Yes using the
--custom_list_of_wsisargument. Provide a list of WSI names in a CSV (with slide extension,wsi). Optionally, provide the mpp (fieldmpp)
- A: Yes using the
-
Q: Do I need to install any additional packages to use Trident?
- A:
pip install -e .installs core dependencies. Some optional components still require extra installs. Use profiles (.[patch-encoders],.[slide-encoders],.[convert],.[omezarr]or.[full]) and runtrident-doctorfor preflight checks.
- A:
License and Terms of Use
β Mahmood Lab. This repository is released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of this repository is prohibited and requires prior approval. By downloading any pretrained encoder, you agree to follow the model's respective license.
Acknowledgements
The project was built on top of amazing repositories such as Timm, HuggingFace, and open-source contributions from the community. We thank the authors and developers for their contribution.
Issues
- The preferred mode of communication is via GitHub issues.
- If GitHub issues are inappropriate, email guillaume.jaume@unil.ch and andrewzh@mit.edu.
- Immediate response to minor issues may not be available.
Funding
This work was funded by NIH NIGMS R35GM138216.
How to cite
If you find our work useful in your research or if you use parts of this code, please consider citing our papers:
@article{zhang2025standardizing,
title={Accelerating Data Processing and Benchmarking of AI Models for Pathology},
author={Zhang, Andrew and Jaume, Guillaume and Vaidya, Anurag and Ding, Tong and Mahmood, Faisal},
journal={arXiv preprint arXiv:2502.06750},
year={2025}
}
@article{vaidya2025molecular,
title={Molecular-driven Foundation Model for Oncologic Pathology},
author={Vaidya, Anurag and Zhang, Andrew and Jaume, Guillaume and Song, Andrew H and Ding, Tong and Wagner, Sophia J and Lu, Ming Y and Doucet, Paul and Robertson, Harry and Almagro-Perez, Cristina and others},
journal={arXiv preprint arXiv:2501.16652},
year={2025}
}