Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs

December 31, 2024 · View on GitHub

by Jeongkee Lim, and Yusung Kim

[ECCV24 Paper]

:bell: We are happy to announce that CSI was accepted at ECCV24. :bell:

Overview

The challenge of semantic segmentation in Unsupervised Domain Adaptation (UDA) emerges not only from domain shifts between source and target images but also from discrepancies in class taxonomies across domains. Traditional UDA research assumes consistent taxonomy between the source and target domains, thereby limiting their ability to recognize and adapt to the taxonomy of the target domain.

We introduces a novel approach, Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using Vision Language Models (CSI), which effectively performs domain-adaptive semantic segmentation even in situations of source-target class mismatches. CSI leverages the semantic generalization potential of Visual Language Models (VLMs) to create synergy with previous UDA methods. It leverages segment reasoning obtained through traditional UDA methods, combined with the rich semantic knowledge embedded in VLMs, to relabel new classes in the target domain.

CSI Overview

This approach allows for effective adaptation to extended taxonomies without requiring any ground truth label for the target domain. Our method has shown to be effective across various benchmarks in situations of inconsistent taxonomy settings (coarse-to-fine taxonomy and open taxonomy) and demonstrates consistent synergy effects when integrated with previous state-of-the-art UDA methods. If you find CSI useful in your research, please consider citing.

@inproceedings{lim2025cross,
  title={Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs},
  author={Lim, Jeongkee and Kim, Yusung},
  booktitle={European Conference on Computer Vision},
  pages={18--35},
  year={2025}
}

Environment Setup

Install Libraries

LibraryVersion
Python3.8.x
CUDA12.1
MMCV2.1.0
MMSegmentation1.2.2

For this project, we used python 3.8.x and CUDA 12.1. We recommend setting up a new virtual environment.

conda create --name csi python=3.8 setuptools=58.2 -y
conda activate csi
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia -y

Install the requirements.

pip install mmengine
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html
pip install -v -e .

Download Pretrained Weights

Download the MiT-B5 weight.

MiT-B5 pretrained on ImageNet-1K provided by the official SegFormer repository.

Put the pretrained weights in a folder pretrained/ within this project.

Download Datasets

Synthia: Please, download SYNTHIA-RAND-CITYSCAPES from here and extract it to data/synthia.

Cityscapes: Please, download leftImg8bit_trainvaltest.zip and gt_trainvaltest.zip from here and extract them to data/cityscapes.

GTA (Optional): Please, download all image and label packages from here and extract them to data/gta and data/gta16.

ACDC (Optional): Please, download rgb_anon_trainvaltest.zip and gt_trainval.zip from here and extract them to data/acdc. Further, please restructure the folders from condition/split/sequence/ to split/ using the following commands:

rsync -a data/acdc/rgb_anon/*/train/*/* data/acdc/rgb_anon/train/
rsync -a data/acdc/rgb_anon/*/val/*/* data/acdc/rgb_anon/val/
rsync -a data/acdc/gt/*/train/*/*_labelTrainIds.png data/acdc/gt/train/
rsync -a data/acdc/gt/*/val/*/*_labelTrainIds.png data/acdc/gt/val/

The final folder structure should look like this:

CSI
├── ...
├── data
│   ├── acdc (optional)
│   │   ├── gt
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── rgb_anon
│   │   │   ├── train
│   │   │   ├── val
│   ├── cityscapes
│   │   ├── leftImg8bit
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── gtFine
│   │   │   ├── train
│   │   │   ├── val
│   ├── gta
│   │   ├── images
│   │   ├── labels
│   ├── gta16
│   │   ├── images
│   │   ├── labels
│   ├── synthia
│   │   ├── RGB
│   │   ├── GT
│   │   │   ├── LABELS
├── ...

Data Preprocessing

Please run the following scripts to convert the label IDs to the train IDs and to generate the class index for RCS.

python tools/convert_datasets/synthia.py data/synthia/ --nproc 8
python tools/convert_datasets/cityscapes.py data/cityscapes/ --nproc 8
python tools/convert_datasets/gta.py data/gta --nproc 8
python tools/convert_datasets/gta16.py data/gta16 --nproc 8

Training

For the experiments in our paper, we use a script to automatically generate and train the configs.

python run_experiments.py --exp <ID>

The logs and checkpoints are stored in work_dirs/.

Experiment IDDescription
90Table 1
91Table 2
92Table 3
93Table 4
94Table 5

Evaluation

A trained model can be evaluated using script.

python tools/test.py config_path checkpoint_path --cfg-options default_hooks.visualization.interval=1 default_hooks.visualization.mode_prefix=False uda.mask_mode=None

The predictions are saved for inspection to work_dirs/run_name/preds and the mIoU of the model is printed to the console.

Framework Structure

This project is based on mmsegmentation version 1.2.2. For more information about the framework structure and the config system, please refer to the mmsegmentation documentation and the mmcv documentation.

The most relevant files for CSI are:

Acknowledgements

This project is based on the following open-source projects. We thank their authors for making the source code publically available.