InternImage for Object Detection
March 4, 2025 ยท View on GitHub
This folder contains the implementation of the InternImage for object detection.
Our detection code is developed on top of MMDetection v2.28.1.
Installation
- Clone this repository:
git clone https://github.com/OpenGVLab/InternImage.git
cd InternImage
- Create a conda virtual environment and activate it:
conda create -n internimage python=3.9
conda activate internimage
- Install
CUDA>=10.2withcudnn>=7following the official installation instructions - Install
PyTorch>=1.10.0andtorchvision>=0.9.0withCUDA>=10.2:
For examples, to install torch==1.11 with CUDA==11.3:
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
-
Install other requirements:
note: conda opencv will break torchvision as not to support GPU, so we need to install opencv using pip.
conda install -c conda-forge termcolor yacs pyyaml scipy pip -y
pip install opencv-python
- Install
timm,mmcv-fulland `mmsegmentation':
pip install -U openmim
mim install mmcv-full==1.5.0
mim install mmsegmentation==0.27.0
pip install timm==0.6.11 mmdet==2.28.1
- Install other requirements:
pip install opencv-python termcolor yacs pyyaml scipy
# Please use a version of numpy lower than 2.0
pip install numpy==1.26.4
pip install pydantic==1.10.13
pip install yapf==0.40.1
- Compile CUDA operators
Before compiling, please use the nvcc -V command to check whether your nvcc version matches the CUDA version of PyTorch.
cd ./ops_dcnv3
sh ./make.sh
# unit test (should see all checking is True)
python test.py
- You can also install the operator using precompiled
.whlfiles DCNv3-1.0-whl
Data Preparation
Prepare datasets according to the guidelines in MMDetection v2.28.1.
Released Models
Dataset: COCO
| method | backbone | schd | box mAP | mask mAP | #param | FLOPs | Config | Download |
|---|---|---|---|---|---|---|---|---|
| Mask R-CNN | InternImage-T | 1x | 47.2 | 42.5 | 49M | 270G | config | ckpt | log |
| Mask R-CNN | InternImage-T | 3x | 49.1 | 43.7 | 49M | 270G | config | ckpt | log |
| Mask R-CNN | InternImage-S | 1x | 47.8 | 43.3 | 69M | 340G | config | ckpt | log |
| Mask R-CNN | InternImage-S | 3x | 49.7 | 44.5 | 69M | 340G | config | ckpt | log |
| Mask R-CNN | InternImage-B | 1x | 48.8 | 44.0 | 115M | 501G | config | ckpt | log |
| Mask R-CNN | InternImage-B | 3x | 50.3 | 44.8 | 115M | 501G | config | ckpt | log |
| Cascade | InternImage-L | 1x | 54.9 | 47.7 | 277M | 1399G | config | ckpt |
| Cascade | InternImage-L | 3x | 56.1 | 48.5 | 277M | 1399G | config | ckpt | log |
| Cascade | InternImage-XL | 1x | 55.3 | 48.1 | 387M | 1782G | config | ckpt | log |
| Cascade | InternImage-XL | 3x | 56.2 | 48.8 | 387M | 1782G | config | ckpt | log |
| method | backbone | schd | box mAP | #param | Config | Download |
|---|---|---|---|---|---|---|
| DINO | InternImage-T | 1x | 53.9 | 49M | config | ckpt | log |
| DINO | InternImage-L | 1x | 57.6 | 241M | config | ckpt | log |
| DINO | InternImage-H | 1x | 63.4 | 1.1B | config | ckpt |
| DINO | CB-InternImage-H | 1x | 64.5 | 2.2B | config | ckpt |
| DINO (TTA) | CB-InternImage-H | 1x | 65.0 | 2.2B | - | ckpt |
| DINO | InternImage-G | 1x | 64.2 | 3.1B | config | ckpt |
| DINO | CB-InternImage-G | 1x | 65.1 | 6B | - | - |
| DINO (TTA) | CB-InternImage-G | 1x | 65.3 | 6B | - | - |
Dataset: LVIS
Dataset: OpenImages
Dataset: VOC 2007 & 2012
Evaluation
To evaluate our InternImage on COCO val, run:
sh dist_test.sh <config-file> <checkpoint> <gpu-num> --eval bbox segm
For example, to evaluate the InternImage-T with a single GPU:
python test.py configs/coco/mask_rcnn_internimage_t_fpn_1x_coco.py pretrained/mask_rcnn_internimage_t_fpn_1x_coco.pth --eval bbox segm
For example, to evaluate the InternImage-B with a single node with 8 GPUs:
sh dist_test.sh configs/coco/mask_rcnn_internimage_b_fpn_1x_coco.py pretrained/mask_rcnn_internimage_b_fpn_1x_coco.py 8 --eval bbox segm
Training
To train an InternImage on COCO, run:
sh dist_train.sh <config-file> <gpu-num>
For example, to train InternImage-T with 8 GPU on 1 node, run:
sh dist_train.sh configs/coco/mask_rcnn_internimage_t_fpn_1x_coco.py 8
Manage Jobs with Slurm
For example, to train InternImage-L with 32 GPU on 4 node, run:
GPUS=32 sh slurm_train.sh <partition> <job-name> configs/coco/cascade_internimage_xl_fpn_3x_coco.py work_dirs/cascade_internimage_xl_fpn_3x_coco
Export
Install mmdeploy at first:
pip install mmdeploy==0.14.0
To export a detection model from PyTorch to TensorRT, run:
MODEL="model_name"
CKPT_PATH="/path/to/model/ckpt.pth"
python deploy.py \
"./deploy/configs/mmdet/instance-seg/instance-seg_tensorrt_dynamic-320x320-1344x1344.py" \
"./configs/coco/${MODEL}.py" \
"${CKPT_PATH}" \
"./deploy/demo.jpg" \
--work-dir "./work_dirs/mmdet/instance-seg/${MODEL}" \
--device cuda \
--dump-info
For example, to export mask_rcnn_internimage_t_fpn_1x_coco from PyTorch to TensorRT, run:
MODEL="mask_rcnn_internimage_t_fpn_1x_coco"
CKPT_PATH="/path/to/model/ckpt/mask_rcnn_internimage_t_fpn_1x_coco.pth"
python deploy.py \
"./deploy/configs/mmdet/instance-seg/instance-seg_tensorrt_dynamic-320x320-1344x1344.py" \
"./configs/coco/${MODEL}.py" \
"${CKPT_PATH}" \
"./deploy/demo.jpg" \
--work-dir "./work_dirs/mmdet/instance-seg/${MODEL}" \
--device cuda \
--dump-info