InternImage for Object Detection

March 4, 2025 · View on GitHub

This folder contains the implementation of the InternImage for object detection.

Our detection code is developed on top of MMDetection v2.28.1.

Installation
Data Preparation
Released Models
Evaluation
Training
Manage Jobs with Slurm
Export

Installation

Clone this repository:

git clone https://github.com/OpenGVLab/InternImage.git
cd InternImage

Create a conda virtual environment and activate it:

conda create -n internimage python=3.9
conda activate internimage

Install CUDA>=10.2 with cudnn>=7 following the official installation instructions
Install PyTorch>=1.10.0 and torchvision>=0.9.0 with CUDA>=10.2:

For examples, to install torch==1.11 with CUDA==11.3:

pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113  -f https://download.pytorch.org/whl/torch_stable.html

Install other requirements:

note: conda opencv will break torchvision as not to support GPU, so we need to install opencv using pip.

conda install -c conda-forge termcolor yacs pyyaml scipy pip -y
pip install opencv-python

Install timm, mmcv-full and `mmsegmentation':

pip install -U openmim
mim install mmcv-full==1.5.0
mim install mmsegmentation==0.27.0
pip install timm==0.6.11 mmdet==2.28.1

Install other requirements:

pip install opencv-python termcolor yacs pyyaml scipy
# Please use a version of numpy lower than 2.0
pip install numpy==1.26.4
pip install pydantic==1.10.13
pip install yapf==0.40.1

Compile CUDA operators

Before compiling, please use the nvcc -V command to check whether your nvcc version matches the CUDA version of PyTorch.

cd ./ops_dcnv3
sh ./make.sh
# unit test (should see all checking is True)
python test.py

You can also install the operator using precompiled .whl files DCNv3-1.0-whl

Data Preparation

Prepare datasets according to the guidelines in MMDetection v2.28.1.

Released Models

Dataset: COCO

method	backbone	schd	box mAP	mask mAP	#param	FLOPs	Config	Download
Mask R-CNN	InternImage-T	1x	47.2	42.5	49M	270G	config	ckpt \| log
Mask R-CNN	InternImage-T	3x	49.1	43.7	49M	270G	config	ckpt \| log
Mask R-CNN	InternImage-S	1x	47.8	43.3	69M	340G	config	ckpt \| log
Mask R-CNN	InternImage-S	3x	49.7	44.5	69M	340G	config	ckpt \| log
Mask R-CNN	InternImage-B	1x	48.8	44.0	115M	501G	config	ckpt \| log
Mask R-CNN	InternImage-B	3x	50.3	44.8	115M	501G	config	ckpt \| log
Cascade	InternImage-L	1x	54.9	47.7	277M	1399G	config	ckpt
Cascade	InternImage-L	3x	56.1	48.5	277M	1399G	config	ckpt \| log
Cascade	InternImage-XL	1x	55.3	48.1	387M	1782G	config	ckpt \| log
Cascade	InternImage-XL	3x	56.2	48.8	387M	1782G	config	ckpt \| log

method	backbone	schd	box mAP	#param	Config	Download
DINO	InternImage-T	1x	53.9	49M	config	ckpt \| log
DINO	InternImage-L	1x	57.6	241M	config	ckpt \| log
DINO	InternImage-H	1x	63.4	1.1B	config	ckpt
DINO	CB-InternImage-H	1x	64.5	2.2B	config	ckpt
DINO (TTA)	CB-InternImage-H	1x	65.0	2.2B	-	ckpt
DINO	InternImage-G	1x	64.2	3.1B	config	ckpt
DINO	CB-InternImage-G	1x	65.1	6B	-	-
DINO (TTA)	CB-InternImage-G	1x	65.3	6B	-	-

Dataset: LVIS

method	backbone	minival (ss)	val (ss/ms)	#param	Config	Download
DINO	CB-InternImage-H	65.8	62.3 / 63.2	2.18B	config	ckpt

Dataset: OpenImages

method	backbone	mAP (ss)	#param	Config	Download
DINO	CB-InternImage-H	74.1	2.18B	config	ckpt

Dataset: VOC 2007 & 2012

method	backbone	VOC 2007	VOC 2012	#param	Config	Download
DINO	CB-InternImage-H	94.0	97.2	2.18B	config	ckpt

Evaluation

To evaluate our InternImage on COCO val, run:

sh dist_test.sh <config-file> <checkpoint> <gpu-num> --eval bbox segm

For example, to evaluate the InternImage-T with a single GPU:

python test.py configs/coco/mask_rcnn_internimage_t_fpn_1x_coco.py pretrained/mask_rcnn_internimage_t_fpn_1x_coco.pth --eval bbox segm

For example, to evaluate the InternImage-B with a single node with 8 GPUs:

sh dist_test.sh configs/coco/mask_rcnn_internimage_b_fpn_1x_coco.py pretrained/mask_rcnn_internimage_b_fpn_1x_coco.py 8 --eval bbox segm

Training

To train an InternImage on COCO, run:

sh dist_train.sh <config-file> <gpu-num>

For example, to train InternImage-T with 8 GPU on 1 node, run:

sh dist_train.sh configs/coco/mask_rcnn_internimage_t_fpn_1x_coco.py 8

Manage Jobs with Slurm

For example, to train InternImage-L with 32 GPU on 4 node, run:

GPUS=32 sh slurm_train.sh <partition> <job-name> configs/coco/cascade_internimage_xl_fpn_3x_coco.py work_dirs/cascade_internimage_xl_fpn_3x_coco

Export

Install mmdeploy at first:

pip install mmdeploy==0.14.0

To export a detection model from PyTorch to TensorRT, run:

MODEL="model_name"
CKPT_PATH="/path/to/model/ckpt.pth"

python deploy.py \
    "./deploy/configs/mmdet/instance-seg/instance-seg_tensorrt_dynamic-320x320-1344x1344.py" \
    "./configs/coco/${MODEL}.py" \
    "${CKPT_PATH}" \
    "./deploy/demo.jpg" \
    --work-dir "./work_dirs/mmdet/instance-seg/${MODEL}" \
    --device cuda \
    --dump-info

For example, to export mask_rcnn_internimage_t_fpn_1x_coco from PyTorch to TensorRT, run:

MODEL="mask_rcnn_internimage_t_fpn_1x_coco"
CKPT_PATH="/path/to/model/ckpt/mask_rcnn_internimage_t_fpn_1x_coco.pth"

python deploy.py \
    "./deploy/configs/mmdet/instance-seg/instance-seg_tensorrt_dynamic-320x320-1344x1344.py" \
    "./configs/coco/${MODEL}.py" \
    "${CKPT_PATH}" \
    "./deploy/demo.jpg" \
    --work-dir "./work_dirs/mmdet/instance-seg/${MODEL}" \
    --device cuda \
    --dump-info