CaGNetv2: From Pixel to Patch (accpeted by TNNLS)

January 19, 2022 · View on GitHub

Code for "From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation".

Created by Zhangxuan Gu, Siyuan Zhou, Li Niu*, Zihan Zhao, Liqing Zhang*.

Paper Link: [arXiv]

Note

This work is an extension of our previous CaGNet [arXiv, github].

Visualization on Pascal-VOC

Visualization on Pascal-VOC

Introduction

Zero-shot learning has been actively studied for image classification task to relieve the burden of annotating image labels. Interestingly, semantic segmentation task requires more labor-intensive pixel-wise annotation, but zero-shot semantic segmentation has only attracted limited research interest. Thus, we focus on zero-shot semantic segmentation, which aims to segment unseen objects with only category-level semantic representations provided for unseen categories. In this paper, we propose a novel Context-aware feature Generation Network (CaGNetv2), which can synthesize context-aware pixel-wise visual features for unseen categories based on category-level semantic representations and pixel-wise contextual information. The synthesized features are used to finetune the classifier to enable segmenting unseen objects. Furthermore, we extend pixel-wise feature generation and finetuning to patch-wise feature generation and finetuning, which additionally considers inter-pixel relationship. Experimental results on Pascal-VOC, Pascal-Context, and COCO-stuff show that our method significantly outperforms the existing zero-shot semantic segmentation methods.

Overview of Our CaGNet

Experimental Results

We compare our CaGNetv2 with SPNet [github, paper] and ZS3Net [github, paper].

“ST” in the following tables stands for self-training mentioned in ZS3Net.

Our Results on Pascal-VOC dataset

MethodhIoUmIoUPAMAS-mIoUU-mIoUU-PAU-MA
SPNet0.00020.56870.76850.70930.75830.00010.00070.0001
SPNet-c0.26100.63150.77550.71880.78000.15630.29550.2387
ZS3Net0.28740.61640.79410.73490.77300.17650.21470.1580
CaGNet(pi)0.39720.65450.80680.76360.78400.26590.42970.3940
CaGNet(pa)0.43260.66230.80680.76430.78140.29900.51760.4710
ZS3Net+ST0.33280.63020.80950.73820.78020.21150.34070.2637
CaGNet(pi)+ST0.43660.65770.81640.75600.78590.30310.58550.5071
CaGNet(pa)+ST0.45280.66570.80360.76500.78130.31880.59390.5417

Our Results on COCO-Stuff dataset

MethodhIoUmIoUPAMAS-mIoUU-mIoUU-PAU-MA
SPNet0.01400.31640.51320.45930.34610.00700.01710.0007
SPNet-c0.13980.32780.53410.43630.35180.08730.24500.1614
ZS3Net0.14950.33280.54670.48370.34660.09530.22750.2701
CaGNet(pi)0.18190.33450.56580.48450.35490.12230.25450.2701
CaGNet(pa)0.19840.33270.56320.49090.34680.13890.29620.3132
ZS3Net+ST0.16200.33720.56310.48620.34890.10550.24880.2718
CaGNet(pi)+ST0.19460.33720.56760.48540.35550.13400.26700.2728
CaGNet(pa)+ST0.22690.34560.57110.46290.36170.16540.37020.2567

Our Results on Pascal-Context dataset

MethodhIoUmIoUPAMAS-mIoUU-mIoUU-PAU-MA
SPNet00.29380.57930.44860.3357000
SPNet-c0.07180.30790.57900.44880.35140.04000.16730.1361
ZS3Net0.12460.30100.57100.44420.33040.07680.19220.1532
CaGNet(pi)0.20610.33470.59750.49000.36100.14420.39760.3248
CaGNet(pa)0.21350.32430.58160.50820.37180.14980.39810.3412
ZS3Net+ST0.14880.31020.58420.45320.33980.09530.30300.1721
CaGNet(pi)+ST0.22520.33520.59510.49620.36440.16300.40380.4214
CaGNet(pa)+ST0.24780.33640.58320.49640.34820.19230.40750.4023

Please note that our reproduced results of SPNet on Pascal-VOC dataset are obtained using their released model and code with careful tuning, but still lower than their reported results.

Code

COMING SOON !