DoViT-code

October 18, 2023 ยท View on GitHub

PyTorch implementation of our paper "Dynamic Token-Pass Transformers for Semantic Segmentation" to appear in WACV'24.

Installation

Requirements:

python >= 3.6
mmsegmentation >= 0.25.0

Please see the documentation of mmsegmentation for details.

Get Started

  1. Download mmseg pretrained backbones (links in the config files), and convert to DoViT architecture by:
bash tools/model_converters/mmseg2dovit.py </path/to/pretrained_backbone> </path/to/output_backbone>

Note: We modified the EncoderDecoder class in mmseg to support DoViT backbones.

  1. Training on 8 GPUs:
bash tools/dist_train.sh </path/config_file.py> 8 --work-dir <output_dir>

Citation

@inproceedings{liu2024dovit,
    author    = {Liu, Yuang and Zhou, Qiang and Wang, Jin and Wang, Zhibin and Wang, Fan and Wang, Jun and Zhang, Wei},
    title     = {Dynamic Token-Pass Transformers for Semantic Segmentation},
    booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2024}
}