Title: Causal Unsupervised Semantic Segmentation

August 5, 2025 ยท View on GitHub

Pattern Recognition, Journal

This is pytorch implementation code for realizing the technical part of CAusal Unsupervised Semantic sEgmentation (CAUSE) to improve performance of unsupervised semantic segmentation. This code is further developed by two baseline codes of HP: Leveraging Hidden Positives for Unsupervised Semantic Segmentation accepted in CVPR 2023 and STEGO: Unsupervised Semantic Segmentation by Distilling Feature Correspondences accepted in ICLR 2022.


You can see the following bundle of images in Appendix. Further, we explain concrete implementation beyond the description of the main paper.

Figure 1. Visual comparison of USS for COCO-stuff. Note that, in contrast to true labels, baseline frameworks fail to achieve targeted level of granularity, while CAUSE successfully clusters person, sports, vehicle, etc.
Figure 2. Qualitative comparison of unsupervised semantic segmentation for Cityscapes.
Figure 3. Log scale of mIoU results for each categories in COCO-Stuff (Black: Thing / Gray: Stuff )

๐Ÿš€ Download Visual Quality, Seg Head Parameter, and Concept ClusterBook of CAUSE

You can download the checkpoint files including CAUSE-trained parameters based on DINO, DINOv2, iBOT, MSN, MAE in self-supervised vision transformer framework. If you want to download the pretrained models of DINO in various structures the following CAUSE uses, you can download them in the following links:


DatasetMethodBaselinemIoU(%)pAcc(%)Visual QualitySeg Head ParameterConcept ClusterBook
COCO-StuffDINO+CAUSE-MLPViT-S/827.966.8[link][link][link]
COCO-StuffDINO+CAUSE-TRViT-S/832.469.6[link][link][link]
COCO-StuffDINO+CAUSE-MLPViT-S/1625.966.3[link][link][link]
COCO-StuffDINO+CAUSE-TRViT-S/1633.170.4[link][link][link]
COCO-StuffDINO+CAUSE-MLPViT-B/834.372.8[link][link][link]
COCO-StuffDINO+CAUSE-TRViT-B/841.974.9[link][link][link]
COCO-StuffDINOv2+CAUSE-TRViT-B/1445.378.0[link][link][link]
COCO-StuffiBOT+CAUSE-TRViT-B/1639.573.8[link][link][link]
COCO-StuffMSN+CAUSE-TRViT-S/1634.172.1[link][link][link]
COCO-StuffMAE+CAUSE-TRViT-B/1621.559.1[link][link][link]

DatasetMethodBaselinemIoU(%)pAcc(%)Visual QualitySeg Head ParameterConcept ClusterBook
CityscapesDINO+CAUSE-MLPViT-S/821.787.7[link][link][link]
CityscapesDINO+CAUSE-TRViT-S/824.689.4[link][link][link]
CityscapesDINO+CAUSE-MLPViT-B/825.790.3[link][link][link]
CityscapesDINO+CAUSE-TRViT-B/828.090.8[link][link][link]
CityscapesDINOv2+CAUSE-TRViT-B/1429.989.8[link][link][link]
CityscapesiBOT+CAUSE-TRViT-B/1623.089.1[link][link][link]
CityscapesMSN+CAUSE-TRViT-S/1621.289.1[link][link][link]
CityscapesMAE+CAUSE-TRViT-B/1612.582.0[link][link][link]

DatasetMethodBaselinemIoU(%)pAcc(%)Visual QualitySeg Head ParameterConcept ClusterBook
Pascal VOCDINO+CAUSE-MLPViT-S/846.0-[link][link][link]
Pascal VOCDINO+CAUSE-TRViT-S/850.0-[link][link][link]
Pascal VOCDINO+CAUSE-MLPViT-B/847.9-[link][link][link]
Pascal VOCDINO+CAUSE-TRViT-B/853.3-[link][link][link]
Pascal VOCDINOv2+CAUSE-TRViT-B/1453.291.5[link][link][link]
Pascal VOCiBOT+CAUSE-TRViT-B/1653.489.6[link][link][link]
Pascal VOCMSN+CAUSE-TRViT-S/1630.284.2[link][link][link]
Pascal VOCMAE+CAUSE-TRViT-B/1625.883.7[link][link][link]

DatasetMethodBaselinemIoU(%)pAcc(%)Visual QualitySeg Head ParameterConcept ClusterBook
COCO-81DINO+CAUSE-MLPViT-S/819.178.8[link][link][link]
COCO-81DINO+CAUSE-TRViT-S/821.275.2[link][link][link]
COCO-171DINO+CAUSE-MLPViT-S/810.644.9[link][link][link]
COCO-171DINO+CAUSE-TRViT-S/815.246.6[link][link][link]

๐Ÿค– CAUSE Framework (Top-Level File Directory Layout)

.
โ”œโ”€โ”€ loader
โ”‚   โ”œโ”€โ”€ netloader.py                # Self-Supervised Pretrained Model Loader & Segmentation Head Loader
โ”‚   โ””โ”€โ”€ dataloader.py               # Dataloader Thanks to STEGO [ICLR 2022]
โ”‚
โ”œโ”€โ”€ models                          # Model Design of Self-Supervised Pretrained: [DINO/DINOv2/iBOT/MAE/MSN]
โ”‚   โ”œโ”€โ”€ dinomaevit.py               # ViT Structure of DINO and MAE
โ”‚   โ”œโ”€โ”€ dinov2vit.py                # ViT Structure of DINOv2
โ”‚   โ”œโ”€โ”€ ibotvit.py                  # ViT Structure of iBOT
โ”‚   โ””โ”€โ”€ msnvit.py                   # ViT Structure of MSN
โ”‚
โ”œโ”€โ”€ modules                         # Segmentation Head and Its Necessary Function
โ”‚   โ””โ”€โ”€ segment_module.py           # [Including Tools with Generating Concept Book and Contrastive Learning
โ”‚   โ””โ”€โ”€ segment.py                  # [MLP & TR] Including Tools with Generating Concept Book and Contrastive Learning
โ”‚
โ”œโ”€โ”€ utils
โ”‚   โ””โ”€โ”€ utils.py                    # Utility for auxiliary tools
โ”‚
โ”œโ”€โ”€ test_mlp.py                     # [MLP] Evaluating Unsupervised Semantic Segmantation Performance (Post-Processing)
โ”œโ”€โ”€ test_tr.py                      # [TR] Evaluating Unsupervised Semantic Segmantation Performance (Post-Processing)
โ”‚
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

๐Ÿ“Š How to Run CAUSE?

bash run

In this shell script file, you can see the following script

#!/bin/bash
######################################
# [OPTION] DATASET

# cocostuff27
dataset="cocostuff27"

# cityscapes
# dataset="cityscapes"

# pascalvoc
# dataset="pascalvoc"

# coco-81
# dataset="coco81"

# coco-171
# dataset="coco171"
######################################

######################################
# [OPTION] STRUCTURE
# structure="MLP"
structure="TR"
######################################

######################################
# [OPTION] Self-Supervised Method

# DINO
# ckpt="checkpoint/dino_vit_small_8.pth"
# ckpt="checkpoint/dino_vit_small_16.pth"
ckpt="checkpoint/dino_vit_base_8.pth"
# ckpt="checkpoint/dino_vit_base_16.pth"

# DINOv2
# ckpt="checkpoint/dinov2_vit_base_14.pth"

# iBOT
# ckpt="checkpoint/ibot_vit_base_16.pth"

# MSN
# ckpt="checkpoint/msn_vit_small_16.pth"

# MAE
# ckpt="checkpoint/mae_vit_base_16.pth"
######################################

######################################
# GPU and PORT
test_gpu="0"
port=$(($RANDOM%800+1200))
######################################

######################################
# TEST
if [ "$structure" = "MLP" ]
then 
    python test_mlp.py --dataset $dataset --ckpt $ckpt --gpu $test_gpu
elif [ "$structure" = "TR" ]
then 
    python test_tr.py --dataset $dataset --ckpt $ckpt --gpu $test_gpu
fi
######################################

Testing CAUSE

python test_mlp.py # CAUSE-MLP

# or

python test_tr.py # CAUSE-TR

๐Ÿ’ก Environment Settings

  • Creating Virtual Environment by Anaconda

conda create -y -n neurips python=3.9

  • Installing PyTorch Package in Virtual Envrionment

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

  • Installing Pip Package

pip install -r requirements.txt

  • [Optional] Removing Conda and PIP Cache if Conda and PIP have been locked by unknown reasons

conda clean -a && pip cache purge


๐Ÿ… Download Datasets

Available Datasets

Note: Pascal VOC is not necessary to download because dataloader will automatically download in your own dataset path

Try the following scripts

If the above do not work, then download azcopy and follow the below scripts

Unzip Datasets

unzip cocostuff.zip && unzip cityscapes.zip