Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation (CVPR 2024)

June 16, 2024 · View on GitHub

Project Page | Paper

Method

Additional qualitative examples

Additional qualitative results

Additional examples in-the-wild

in-the-wild examples

Setup

Our setup is based on pytorch 1.13.1, mmcv 1.6.2 and mmsegmentation 0.27.0. To create the same environment that we used for our experiments:

python3 -m venv ./freeda
source ./freeda/bin/activate
pip install -U pip setuptools wheel

Install PyTorch:

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

Install other dependencies:

pip install -r requirements.txt

Download both the prototype embeddings and the faiss index, and decompress them into ./data:

cd ./data
mkdir "prototype_embeddings"
tar -xvzf prototype_embeddings.tar -C ./prototype_embeddings
unzip faiss_index.zip

Datasets

This section is adapted from TCL and GroupViT README.

The overall file structure is as follows:

src
├── data
│   ├── cityscapes
│   │   ├── leftImg8bit
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── gtFine
│   │   │   ├── train
│   │   │   ├── val
│   ├── VOCdevkit
│   │   ├── VOC2012
│   │   │   ├── JPEGImages
│   │   │   ├── SegmentationClass
│   │   │   ├── ImageSets
│   │   │   │   ├── Segmentation
│   │   ├── VOC2010
│   │   │   ├── JPEGImages
│   │   │   ├── SegmentationClassContext
│   │   │   ├── ImageSets
│   │   │   │   ├── SegmentationContext
│   │   │   │   │   ├── train.txt
│   │   │   │   │   ├── val.txt
│   │   │   ├── trainval_merged.json
│   │   ├── VOCaug
│   │   │   ├── dataset
│   │   │   │   ├── cls
│   ├── ade
│   │   ├── ADEChallengeData2016
│   │   │   ├── annotations
│   │   │   │   ├── training
│   │   │   │   ├── validation
│   │   │   ├── images
│   │   │   │   ├── training
│   │   │   │   ├── validation
│   ├── coco_stuff164k
│   │   ├── images
│   │   │   ├── train2017
│   │   │   ├── val2017
│   │   ├── annotations
│   │   │   ├── train2017
│   │   │   ├── val2017

Please download and setup PASCAL VOC , PASCAL Context, COCO-Stuff164k , Cityscapes, and ADE20k datasets following MMSegmentation data preparation document.

Evaluation

Pascal VOC:

python -m torch.distributed.run main.py --eval --eval_cfg configs/pascal20/freeda_pascal20.yml --eval_base_cfg configs/pascal20/eval_pascal20.yml

Pascal Context:

python -m torch.distributed.run main.py --eval --eval_cfg configs/pascal59/freeda_pascal59.yml --eval_base_cfg configs/pascal59/eval_pascal59.yml

COCO-Stuff:

python -m torch.distributed.run main.py --eval --eval_cfg configs/cocostuff/freeda_cocostuff.yml --eval_base_cfg configs/cocostuff/eval_cocostuff.yml

Cityscapes:

python -m torch.distributed.run main.py --eval --eval_cfg configs/cityscapes/freeda_cityscapes.yml --eval_base_cfg configs/cityscapes/eval_cityscapes.yml

ADE20K:

python -m torch.distributed.run main.py --eval --eval_cfg configs/ade/freeda_ade.yml --eval_base_cfg configs/ade/eval_ade.yml

If you find FreeDA useful for your work please cite:

@inproceedings{barsellotti2024training
  title={Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation},
  author={Barsellotti, Luca and Amoroso, Roberto and Cornia, Marcella and Baraldi, Lorenzo and Cucchiara, Rita},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}