Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP (NeurIPS 2023)

December 2, 2023 ยท View on GitHub

PWC PWC PWC PWC PWC PWC PWC PWC

This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP


FC-CLIP is an universal model for open-vocabulary image segmentation problems, consisting of a class-agnostic segmenter, in-vocabulary classifier, out-of-vocabulary classifier. With everything built upon a shared single frozen convolutional CLIP model, FC-CLIP not only achieves state-of-the-art performance on various open-vocabulary segmentation benchmarks, but also enjoys a much lower training (3.2 days with 8 V100) and testing costs compared to prior arts.

Installation

See installation instructions.

Getting Started

See Preparing Datasets for FC-CLIP.

See Getting Started with FC-CLIP.

We also support FC-CLIP with HuggingFace ๐Ÿค— Demo

Model Zoo

ADE20K(A-150) Cityscapes Mapillary Vistas ADE20K-Full
(A-847)
Pascal Context 59
(PC-59)
Pascal Context 459
(PC-459)
Pascal VOC 21
(PAS-21)
Pascal VOC 20
(PAS-20)
COCO
(training dataset)
download
PQ mAP mIoU PQ mAP mIoU PQ mIoU mIoU mIoU mIoU mIoU mIoU PQ mAP mIoU
FC-CLIP (ResNet50) 17.9 9.5 23.3 40.3 21.6 53.2 15.9 24.4 7.1 50.5 12.9 75.9 89.5 50.7 40.7 58.8 checkpoint
FC-CLIP (ResNet101) 19.1 10.2 24.0 40.9 24.1 53.9 16.7 23.2 7.7 48.9 12.3 77.6 91.3 51.4 41.6 58.9 checkpoint
FC-CLIP (ResNet50x4) 21.8 11.7 26.8 42.2 23.8 54.6 17.4 24.6 8.7 54.0 13.1 79.0 92.9 52.1 42.8 60.4 checkpoint
FC-CLIP (ResNet50x16) 22.5 13.6 29.4 42.0 25.6 56.0 17.8 26.1 10.3 56.4 15.7 80.7 94.5 54.4 45.0 63.3 checkpoint
FC-CLIP (ResNet50x64) 22.8 13.6 28.4 42.7 27.4 55.1 18.2 27.3 10.8 55.7 16.2 80.3 95.1 55.6 46.4 65.3 checkpoint
FC-CLIP (ConvNeXt-Large) 26.8 16.8 34.1 44.0 26.8 56.2 18.3 27.8 14.8 58.4 18.2 81.8 95.4 54.4 44.6 63.7 checkpoint

Citing FC-CLIP

If you use FC-CLIP in your research, please use the following BibTeX entry.

@inproceedings{yu2023fcclip,
  title={Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP},
  author={Qihang Yu and Ju He and Xueqing Deng and Xiaohui Shen and Liang-Chieh Chen},
  booktitle={NeurIPS},
  year={2023}
}

Acknowledgement

Mask2Former

ODISE