InternImage for Semantic Segmentation

January 24, 2025 · View on GitHub

This folder contains the implementation of the InternImage for semantic segmentation.

Our segmentation code is developed on top of MMSegmentation v0.27.0.

Installation
Data Preparation
Released Models
Evaluation
Training
Manage Jobs with Slurm
Image Demo
Export

Installation

Clone this repository:

git clone https://github.com/OpenGVLab/InternImage.git
cd InternImage

Create a conda virtual environment and activate it:

conda create -n internimage python=3.9
conda activate internimage

Install CUDA>=10.2 with cudnn>=7 following the official installation instructions
Install PyTorch>=1.10.0 and torchvision>=0.9.0 with CUDA>=10.2:

For examples, to install torch==1.11 with CUDA==11.3:

pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113  -f https://download.pytorch.org/whl/torch_stable.html

Install other requirements:

note: conda opencv will break torchvision as not to support GPU, so we need to install opencv using pip.

conda install -c conda-forge termcolor yacs pyyaml scipy pip -y
pip install opencv-python

Install timm, mmcv-full and `mmsegmentation':

pip install -U openmim
mim install mmcv-full==1.5.0
mim install mmsegmentation==0.27.0
pip install timm==0.6.11 mmdet==2.28.1

Install other requirements:

pip install opencv-python termcolor yacs pyyaml scipy
# Please use a version of numpy lower than 2.0
pip install numpy==1.26.4
pip install pydantic==1.10.13

Compile CUDA operators

Before compiling, please use the nvcc -V command to check whether your nvcc version matches the CUDA version of PyTorch.

cd ./ops_dcnv3
sh ./make.sh
# unit test (should see all checking is True)
python test.py

You can also install the operator using precompiled .whl files DCNv3-1.0-whl

Data Preparation

Prepare datasets according to the guidelines in MMSegmentation.

Released Models

Dataset: ADE20K

method	backbone	resolution	mIoU (ss/ms)	#param	FLOPs	Config	Download
UperNet	InternImage-T	512x512	47.9 / 48.1	59M	944G	config	ckpt \| log
UperNet	InternImage-S	512x512	50.1 / 50.9	80M	1017G	config	ckpt \| log
UperNet	InternImage-B	512x512	50.8 / 51.3	128M	1185G	config	ckpt \| log
UperNet	InternImage-L	640x640	53.9 / 54.1	256M	2526G	config	ckpt \| log
UperNet	InternImage-XL	640x640	55.0 / 55.3	368M	3142G	config	ckpt \| log
UperNet	InternImage-H	896x896	59.9 / 60.3	1.12B	3566G	config	ckpt \| log
Mask2Former	InternImage-H	896x896	62.6 / 62.9	1.31B	4635G	config	ckpt \| log

Dataset: Cityscapes

method	backbone	resolution	mIoU (ss/ms)	#params	FLOPs	Config	Download
UperNet	InternImage-T	512x1024	82.58 / 83.40	59M	1889G	config	ckpt \| log
UperNet	InternImage-S	512x1024	82.74 / 83.45	80M	2035G	config	ckpt \| log
UperNet	InternImage-B	512x1024	83.18 / 83.97	128M	2369G	config	ckpt \| log
UperNet	InternImage-L	512x1024	83.68 / 84.41	256M	3234G	config	ckpt \| log
UperNet*	InternImage-L	512x1024	85.94 / 86.22	256M	3234G	config	ckpt \| log
UperNet	InternImage-XL	512x1024	83.62 / 84.28	368M	4022G	config	ckpt \| log
UperNet*	InternImage-XL	512x1024	86.20 / 86.42	368M	4022G	config	ckpt \| log
SegFormer*	InternImage-L	512x1024	85.16 / 85.67	220M	1580G	config	ckpt \| log
SegFormer*	InternImage-XL	512x1024	85.41 / 85.93	330M	2364G	config	ckpt \| log
Mask2Former*	InternImage-H	1024x1024	86.37 / 86.96	1094M	7878G	config	ckpt \| log

* denotes the model is trained using extra Mapillary dataset.

Dataset: COCO-Stuff-164K

method	backbone	resolution	mIoU (ss/ms)	#params	FLOPs	Config	Download
Mask2Former	InternImage-H	896x896	52.6 / 52.8	1.31B	4635G	config	ckpt \| log

Dataset: COCO-Stuff-10K

method	backbone	resolution	mIoU (ss/ms)	#params	FLOPs	Config	Download
Mask2Former	InternImage-H	512x512	59.2 / 59.6	1.28B	1528G	config	ckpt \| log

Dataset: Pascal-Context-59

method	backbone	resolution	mIoU (ss/ms)	#param	FLOPs	Config	Download
Mask2Former	InternImage-H	480x480	69.7 / 70.3	1.07B	867G	config	ckpt \| log

Dataset: NYU-Depth-V2

method	backbone	resolution	mIoU (ss/ms)	#param	FLOPs	Config	Download
Mask2Former	InternImage-H	480x480	67.1 / 68.1	1.07B	867G	config	ckpt \| log

Dataset: Mapillary

method	backbone	resolution	#param	FLOPs	Config	Download
UperNet	InternImage-L	512x1024	256M	3234G	config	ckpt
UperNet	InternImage-XL	512x1024	368M	4022G	config	ckpt
SegFormer	InternImage-L	512x1024	220M	1580G	config	ckpt
SegFormer	InternImage-XL	512x1024	330M	2364G	config	ckpt
Mask2Former	InternImage-H	896x896	1094M	7878G	config	ckpt

Evaluation

To evaluate our InternImage on ADE20K val, run:

sh dist_test.sh <config-file> <checkpoint> <gpu-num> --eval mIoU

For example, to evaluate the InternImage-T with a single GPU:

python test.py configs/ade20k/upernet_internimage_t_512_160k_ade20k.py pretrained/upernet_internimage_t_512_160k_ade20k.pth --eval mIoU

For example, to evaluate the InternImage-B with a single node with 8 GPUs:

sh dist_test.sh configs/ade20k/upernet_internimage_b_512_160k_ade20k.py pretrained/upernet_internimage_b_512_160k_ade20k.pth 8 --eval mIoU

Training

To train an InternImage on ADE20K, run:

sh dist_train.sh <config-file> <gpu-num>

For example, to train InternImage-T with 8 GPU on 1 node (total batch size 16), run:

sh dist_train.sh configs/ade20k/upernet_internimage_t_512_160k_ade20k.py 8

Manage Jobs with Slurm

For example, to train InternImage-XL with 8 GPU on 1 node (total batch size 16), run:

GPUS=8 sh slurm_train.sh <partition> <job-name> configs/ade20k/upernet_internimage_xl_640_160k_ade20k.py

Image Demo

To inference a single/multiple image like this. If you specify image containing directory instead of a single image, it will process all the images in the directory.

CUDA_VISIBLE_DEVICES=0 python image_demo.py \
  data/ade/ADEChallengeData2016/images/validation/ADE_val_00000591.jpg \
  configs/ade20k/upernet_internimage_t_512_160k_ade20k.py  \
  checkpoint_dir/seg/upernet_internimage_t_512_160k_ade20k.pth  \
  --palette ade20k

Export

Install mmdeploy at first:

pip install mmdeploy==0.14.0

To export a segmentation model from PyTorch to TensorRT, run:

MODEL="model_name"
CKPT_PATH="/path/to/model/ckpt.pth"

python deploy.py \
    "./deploy/configs/mmseg/segmentation_tensorrt_static-512x512.py" \
    "./configs/ade20k/${MODEL}.py" \
    "${CKPT_PATH}" \
    "./deploy/demo.png" \
    --work-dir "./work_dirs/mmseg/${MODEL}" \
    --device cuda \
    --dump-info

For example, to export upernet_internimage_t_512_160k_ade20k from PyTorch to TensorRT, run:

MODEL="upernet_internimage_t_512_160k_ade20k"
CKPT_PATH="/path/to/model/ckpt/upernet_internimage_t_512_160k_ade20k.pth"

python deploy.py \
    "./deploy/configs/mmseg/segmentation_tensorrt_static-512x512.py" \
    "./configs/ade20k/${MODEL}.py" \
    "${CKPT_PATH}" \
    "./deploy/demo.png" \
    --work-dir "./work_dirs/mmseg/${MODEL}" \
    --device cuda \
    --dump-info