README.md
June 25, 2025 ยท View on GitHub
BOOTPLACE: Bootstrapped Object Placement
with Detection Transformers

BOOTPLACE is a paradigm that formulates object placement as a placement-by-detection problem. It begins by identifying suitable regions of interest for object placement. This is achieved by training a specialized detection transformer on object-subtracted backgrounds, enhanced with multi-object supervisions. It then semantically associates each target compositing object with detected regions based on their complementary characteristics. Through a boostrapped training approach applied to randomly object-subtracted images, it enforces meaningful placements through extensive paired data augmentation.
Check out our Project Page for more visual demos!
โฉ Updates
03/20/2025
- Release training code and pretrained models.
06/24/2025
- Release inference code and data.
๐ฆ Installation
Prerequisites
-
System: The code is currently tested only on Linux.
-
Hardware: An NVIDIA GPU with at least 16GB of memory is necessary. The code has been verified on NVIDIA A6000 GPUs.
-
Software:
- Conda is recommended for managing dependencies.
- Python version 3.6 or higher is required.
Create a new conda environment named
BOOTPLACEand install the dependencies:conda env create --file=BOOTPLACE.ymlDownload DETR-R50 pretrained models for finetuning here and put it in the directory
weights/detr-r50-e632da11.pth.
๐ค Pretrained Models
We provide the following pretrained models:
| Model | Description | #Params | Download |
|---|---|---|---|
| BOOTPLACE_Cityscapes | Multiple supervision | 523M | Download |
๐ Dataset
We provide a large-scale street-scene vehicle placement dataset Download curated from Cityscapes. The file structures are:
โโโ train
โโโ backgrounds:
โโโ imgID.png
โโโ โฆโฆ
โโโ objects:
โโโ imgID:
โโโ object_name_ID.png
โโโ โฆโฆ
โโโ โฆโฆ
โโโ location:
โโโ imgID:
โโโ object_name_ID.txt
โโโ โฆโฆ
โโโ โฆโฆ
โโโ annotations.json
โโโ test
โโโ backgrounds:
โโโ imgID.png
โโโ โฆโฆ
|โโ backgrounds_single
โโโ imgID.png
โโโ โฆโฆ
โโโ objects:
โโโ imgID:
โโโ object_name_ID.png
โโโ โฆโฆ
โโโ โฆโฆ
โโโ objects_single:
โโโ imgID:
โโโ object_name_ID.png
โโโ โฆโฆ
โโโ โฆโฆ
โโโ location:
โโโ imgID:
โโโ object_name_ID.txt
โโโ โฆโฆ
โโโ โฆโฆ
โโโ location_single:
โโโ imgID:
โโโ object_name_ID.txt
โโโ โฆโฆ
โโโ โฆโฆ
โโโ annotations.json
Training
To train a model on Cityscapes:
python -m main \
--epochs 200 \
--batch_size 2 \
--save_freq 10 \
--set_cost_class 1 \
--ce_loss_coef 1 \
--num_queries 120 \
--eos_coef 0.1 \
--lr 1e-4 \
--data_path data/Cityscapes \
--output_dir results/Cityscapes_ckpt \
--resume weights/detr-r50-e632da11.pth
Inference
python test.py \
--num_queries 120 \
--data_path data/Cityscapes \
--pretrained_model 'results/Cityscapes_ckpt/checkpoint.pth' \
--im_root 'data/Cityscapes/test' \
--output_dir 'results/Cityscape_inference'
โ๏ธ License
This project is licensed under the terms of the MIT license.
๐ Citation
If you find this work helpful, please consider citing our paper:
@inproceedings{zhou2025bootplace,
title={BOOTPLACE: Bootstrapped Object Placement with Detection Transformers},
author={Zhou, Hang and Zuo, Xinxin and Ma, Rui and Cheng, Li},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={19294--19303},
year={2025}
}