GCNet for Object Detection

February 16, 2021 · View on GitHub

By Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu.

This repo is a official implementation of "GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond" on COCO object detection based on open-mmlab's mmdetection. The core operator GC block could be find here. Many thanks to mmdetection for their simple and clean framework.

Update on 2020/12/07

The extension of GCNet got accepted by TPAMI (PDF).

Update on 2019/10/28

GCNet won the Best Paper Award at ICCV 2019 Neural Architects Workshop!

Update on 2019/07/01

The code is refactored. More results are provided and all configs could be found in configs/gcnet.

Notes: Both PyTorch official SyncBN and Apex SyncBN have some stability issues. During training, mAP may drops to zero and back to normal during last few epochs.

Update on 2019/06/03

GCNet is supported by the official mmdetection repo here. Thanks again for open-mmlab's work on open source projects.

Introduction

GCNet is initially described in arxiv. Via absorbing advantages of Non-Local Networks (NLNet) and Squeeze-Excitation Networks (SENet), GCNet provides a simple, fast and effective approach for global context modeling, which generally outperforms both NLNet and SENet on major benchmarks for various recognition tasks.

Citing GCNet

@article{cao2019GCNet,
  title={GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond},
  author={Cao, Yue and Xu, Jiarui and Lin, Stephen and Wei, Fangyun and Hu, Han},
  journal={arXiv preprint arXiv:1904.11492},
  year={2019}
}

Main Results

Results on R50-FPN with backbone (fixBN)

Back-bone	Model	Back-bone Norm	Heads	Context	Lr schd	Mem (GB)	Train time (s/iter)	Inf time (fps)	box AP	mask AP	Download
R50-FPN	Mask	fixBN	2fc (w/o BN)	-	1x	3.9	0.453	10.6	37.3	34.2	model
R50-FPN	Mask	fixBN	2fc (w/o BN)	GC(c3-c5, r16)	1x	4.5	0.533	10.1	38.5	35.1	model
R50-FPN	Mask	fixBN	2fc (w/o BN)	GC(c3-c5, r4)	1x	4.6	0.533	9.9	38.9	35.5	model
R50-FPN	Mask	fixBN	2fc (w/o BN)	-	2x	-	-	-	38.2	34.9	model
R50-FPN	Mask	fixBN	2fc (w/o BN)	GC(c3-c5, r16)	2x	-	-	-	39.7	36.1	model
R50-FPN	Mask	fixBN	2fc (w/o BN)	GC(c3-c5, r4)	2x	-	-	-	40.0	36.2	model

Results on R50-FPN with backbone (syncBN)

Back-bone	Model	Back-bone Norm	Heads	Context	Lr schd	Mem (GB)	Train time (s/iter)	Inf time (fps)	box AP	mask AP	Download
R50-FPN	Mask	SyncBN	2fc (w/o BN)	-	1x	3.9	0.543	10.2	37.2	33.8	model
R50-FPN	Mask	SyncBN	2fc (w/o BN)	GC(c3-c5, r16)	1x	4.5	0.547	9.9	39.4	35.7	model
R50-FPN	Mask	SyncBN	2fc (w/o BN)	GC(c3-c5, r4)	1x	4.6	0.603	9.4	39.9	36.2	model
R50-FPN	Mask	SyncBN	2fc (w/o BN)	-	2x	3.9	0.543	10.2	37.7	34.3	model
R50-FPN	Mask	SyncBN	2fc (w/o BN)	GC(c3-c5, r16)	2x	4.5	0.547	9.9	39.7	36.0	model
R50-FPN	Mask	SyncBN	2fc (w/o BN)	GC(c3-c5, r4)	2x	4.6	0.603	9.4	40.2	36.3	model
R50-FPN	Mask	SyncBN	4conv1fc (SyncBN)	-	1x	-	-	-	38.8	34.6	model
R50-FPN	Mask	SyncBN	4conv1fc (SyncBN)	GC(c3-c5, r16)	1x	-	-	-	41.0	36.5	model
R50-FPN	Mask	SyncBN	4conv1fc (SyncBN)	GC(c3-c5, r4)	1x	-	-	-	41.4	37.0	model

Results on stronger backbones

Back-bone	Model	Back-bone Norm	Heads	Context	Lr schd	Mem (GB)	Train time (s/iter)	Inf time (fps)	box AP	mask AP	Download
R101-FPN	Mask	fixBN	2fc (w/o BN)	-	1x	5.8	0.571	9.5	39.4	35.9	model
R101-FPN	Mask	fixBN	2fc (w/o BN)	GC(c3-c5, r16)	1x	7.0	0.731	8.6	40.8	37.0	model
R101-FPN	Mask	fixBN	2fc (w/o BN)	GC(c3-c5, r4)	1x	7.1	0.747	8.6	40.8	36.9	model
R101-FPN	Mask	SyncBN	2fc (w/o BN)	-	1x	5.8	0.665	9.2	39.8	36.0	model
R101-FPN	Mask	SyncBN	2fc (w/o BN)	GC(c3-c5, r16)	1x	7.0	0.778	9.0	41.1	37.4	model
R101-FPN	Mask	SyncBN	2fc (w/o BN)	GC(c3-c5, r4)	1x	7.1	0.786	8.9	41.7	37.6	model
X101-FPN	Mask	SyncBN	2fc (w/o BN)	-	1x	7.1	0.912	8.5	41.2	37.3	model
X101-FPN	Mask	SyncBN	2fc (w/o BN)	GC(c3-c5, r16)	1x	8.2	1.055	7.7	42.4	38.0	model
X101-FPN	Mask	SyncBN	2fc (w/o BN)	GC(c3-c5, r4)	1x	8.3	1.037	7.6	42.9	38.5	model
X101-FPN	Cascade Mask	SyncBN	2fc (w/o BN)	-	1x	-	-	-	44.7	38.3	model
X101-FPN	Cascade Mask	SyncBN	2fc (w/o BN)	GC(c3-c5, r16)	1x	-	-	-	45.9	39.3	model
X101-FPN	Cascade Mask	SyncBN	2fc (w/o BN)	GC(c3-c5, r4)	1x	-	-	-	46.5	39.7	model
X101-FPN	DCN Cascade Mask	SyncBN	2fc (w/o BN)	-	1x	-	-	-	47.1	40.4	model
X101-FPN	DCN Cascade Mask	SyncBN	2fc (w/o BN)	GC(c3-c5, r16)	1x	-	-	-	47.9	40.9	model
X101-FPN	DCN Cascade Mask	SyncBN	2fc (w/o BN)	GC(c3-c5, r4)	1x	-	-	-	47.9	40.8	model

Notes

GC denotes Global Context (GC) block is inserted after 1x1 conv of backbone.
DCN denotes replace 3x3 conv with 3x3 Deformable Convolution in c3-c5 stages of backbone.
r4 and r16 denote ratio 4 and ratio 16 in GC block respectively.
Some of models are trained on 4 GPUs with 4 images on each GPU.

Requirements

Linux(tested on Ubuntu 16.04)
Python 3.6+
PyTorch 1.1.0
Cython
apex (Sync BN)

Install

a. Install PyTorch 1.1 and torchvision following the official instructions.

b. Install latest apex with CUDA and C++ extensions following this instructions. The Sync BN implemented by apex is required.

c. Clone the GCNet repository.

 git clone https://github.com/xvjiarui/GCNet.git

d. Compile cuda extensions.

cd GCNet
pip install cython  # or "conda install cython" if you prefer conda
./compile.sh  # or "PYTHON=python3 ./compile.sh" if you use system python3 without virtual environments

e. Install GCNet version mmdetection (other dependencies will be installed automatically).

python(3) setup.py install  # add --user if you want to install it locally
# or "pip install ."

Note: You need to run the last step each time you pull updates from github. Or you can run python(3) setup.py develop or pip install -e . to install mmdetection if you want to make modifications to it frequently.

Please refer to mmdetection install instruction for more details.

Environment

Hardware

8 NVIDIA Tesla V100 GPUs
Intel Xeon 4114 CPU @ 2.20GHz

Software environment

Python 3.6.7
PyTorch 1.1.0
CUDA 9.0
CUDNN 7.0
NCCL 2.3.5

Usage

Train

As in original mmdetection, distributed training is recommended for either single machine or multiple machines.

./tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> [optional arguments]

Supported arguments are:

--validate: perform evaluation every k (default=1) epochs during the training.
--work_dir <WORK_DIR>: if specified, the path in config file will be replaced.

Evaluation

To evaluate trained models, output file is required.

python tools/test.py <CONFIG_FILE> <MODEL_PATH> [optional arguments]

Supported arguments are:

--gpus: number of GPU used for evaluation
--out: output file name, usually ends wiht .pkl
--eval: type of evaluation need, for mask-rcnn, bbox segm would evaluate both bounding box and mask AP.