SimPLR

February 19, 2025 · View on GitHub

By Duy-Kien Nguyen, Martin R. Oswald, Cees Snoek.

This repository is an official implementation of the paper SimPLR: A Simple and Plain Transformer for Scaling-Efficient Object Detection and Segmentation.

We also provide the implementation of a strong baseline, DeformableDETR + ViT in our repo.

Introduction

TL; DR. SimPLR is a plain architecture which learns multi-scale information in its attention computation.

SimPLR

License

This project is released under the Apache 2.0 license.

Citing SimPLR

If you find SimPLR useful in your research, please consider citing:

@article{nguyen2025simplr,
  title={SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation},
  author={Duy{-}Kien Nguyen and Martin R. Oswald and Cees G. M. Snoek},
  journal={arXiv preprint arXiv:2310.05920},
  year={2025}
}

Installation

Requirements

Linux, CUDA>=11, GCC>=5.4
Python>=3.8

We recommend you to use Anaconda to create a conda environment:
```
conda create -n simplr python=3.8
```
Then, activate the environment:
```
conda activate simplr
```
PyTorch>=2.0, torchvision>=0.11.2 (following instructions here)

For example, you could install pytorch and torchvision as following:
```
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
```
Other requirements & Compilation
```
pip install -e ./
```

The datasets are assumed to exist in a directory specified by the environment variable $E2E_DATASETS. If the environment variable is not specified, it will be set to be .data. Under this directory, detectron2 will look for datasets in the structure described below.

For COCO Detection and Instance Segmentation, please download COCO 2017 dataset and organize them as following:

$E2E_DATASETS/
└── coco/
	├── annotation/
		├── instances_train2017.json
		├── instances_val2017.json
		└── image_info_test-dev2017.json
	├── image/
		├── train2017/
		├── val2017/
		└── test2017/
	└── vocabs/
		└── coco_categories.txt - the mapping from coco categories to indices.

You can download coco_categories.txt here.

Training

Our script is able to automatically detect the number of available gpus on a single node. It works best with Slurm system when it can auto-detect the number of available gpus along with nodes. The command for training SimPLR is simple as following:

python tools/run.py --config ${CONFIG_PATH} --model ${MODEL_TYPE} --task ${TASK_TYPE}

For example,

COCO Detection

python tools/run.py --config e2edet/config/COCO-Detection/deformable_detr_vit_b_w16_4g_5x_lsj.yaml --model deformable_detr_vit --task detection

COCO Instance Segmentation

python tools/run.py --config e2edet/config/COCO-InstanceSegmentation/base_simplr_vit_detection.yaml --model simplr --task detection

Some tips to speed-up training

If your file system is slow to read images but your memory is huge, you may consider enabling 'cache_mode' option to load whole dataset into memory at the beginning of training:

python tools/run.py --config ${CONFIG_PATH} --model ${MODEL_TYPE} --task ${TASK_TYPE} dataset_config.${TASK_TYPE}.cache_mode=True

If your GPU memory does not fit the batch size, you may consider to use 'iter_per_update' to perform gradient accumulation:

python tools/run.py --config ${CONFIG_PATH} --model ${MODEL_TYPE} --task ${TASK_TYPE} training.iter_per_update=2

Our code also supports mixed precision training. It is recommended to use when you GPUs architecture can perform fast FP16 operations:

python tools/run.py --config ${CONFIG_PATH} --model ${MODEL_TYPE} --task ${TASK_TYPE} training.use_fp16=(float16 or bfloat16)

SimPLR

Introduction

License

Citing SimPLR

Installation

Requirements

Usage

Dataset preparation

Training

Some tips to speed-up training