Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation (CVPR 2024)

June 16, 2024 · View on GitHub

Project Page | Paper

Qualitative results

Method

FreeDA method

Additional qualitative examples

Additional qualitative results

Additional examples in-the-wild

in-the-wild examples

Setup

Our setup is based on pytorch 1.13.1, mmcv 1.6.2 and mmsegmentation 0.27.0. To create the same environment that we used for our experiments:

python3 -m venv ./freeda
source ./freeda/bin/activate
pip install -U pip setuptools wheel

Install PyTorch:

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

Install other dependencies:

pip install -r requirements.txt

Download both the prototype embeddings and the faiss index, and decompress them into ./data:

cd ./data
mkdir "prototype_embeddings"
tar -xvzf prototype_embeddings.tar -C ./prototype_embeddings
unzip faiss_index.zip

Datasets

This section is adapted from TCL and GroupViT README.

The overall file structure is as follows:

src
├── data
   ├── cityscapes
   ├── leftImg8bit
   ├── train
   ├── val
   ├── gtFine
   ├── train
   ├── val
   ├── VOCdevkit
   ├── VOC2012
   ├── JPEGImages
   ├── SegmentationClass
   ├── ImageSets
   ├── Segmentation
   ├── VOC2010
   ├── JPEGImages
   ├── SegmentationClassContext
   ├── ImageSets
   ├── SegmentationContext
   ├── train.txt
   ├── val.txt
   ├── trainval_merged.json
   ├── VOCaug
   ├── dataset
   ├── cls
   ├── ade
   ├── ADEChallengeData2016
   ├── annotations
   ├── training
   ├── validation
   ├── images
   ├── training
   ├── validation
   ├── coco_stuff164k
   ├── images
   ├── train2017
   ├── val2017
   ├── annotations
   ├── train2017
   ├── val2017

Please download and setup PASCAL VOC , PASCAL Context, COCO-Stuff164k , Cityscapes, and ADE20k datasets following MMSegmentation data preparation document.

Evaluation

Pascal VOC:

python -m torch.distributed.run main.py --eval --eval_cfg configs/pascal20/freeda_pascal20.yml --eval_base_cfg configs/pascal20/eval_pascal20.yml

Pascal Context:

python -m torch.distributed.run main.py --eval --eval_cfg configs/pascal59/freeda_pascal59.yml --eval_base_cfg configs/pascal59/eval_pascal59.yml

COCO-Stuff:

python -m torch.distributed.run main.py --eval --eval_cfg configs/cocostuff/freeda_cocostuff.yml --eval_base_cfg configs/cocostuff/eval_cocostuff.yml

Cityscapes:

python -m torch.distributed.run main.py --eval --eval_cfg configs/cityscapes/freeda_cityscapes.yml --eval_base_cfg configs/cityscapes/eval_cityscapes.yml

ADE20K:

python -m torch.distributed.run main.py --eval --eval_cfg configs/ade/freeda_ade.yml --eval_base_cfg configs/ade/eval_ade.yml

If you find FreeDA useful for your work please cite:

@inproceedings{barsellotti2024training
  title={Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation},
  author={Barsellotti, Luca and Amoroso, Roberto and Cornia, Marcella and Baraldi, Lorenzo and Cucchiara, Rita},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}