README.md

June 25, 2025 ยท View on GitHub

BOOTPLACE: Bootstrapped Object Placement
with Detection Transformers

BOOTPLACE is a paradigm that formulates object placement as a placement-by-detection problem. It begins by identifying suitable regions of interest for object placement. This is achieved by training a specialized detection transformer on object-subtracted backgrounds, enhanced with multi-object supervisions. It then semantically associates each target compositing object with detected regions based on their complementary characteristics. Through a boostrapped training approach applied to randomly object-subtracted images, it enforces meaningful placements through extensive paired data augmentation.

Check out our Project Page for more visual demos!

โฉ Updates

03/20/2025

  • Release training code and pretrained models.

06/24/2025

  • Release inference code and data.

๐Ÿ“ฆ Installation

Prerequisites

  • System: The code is currently tested only on Linux.

  • Hardware: An NVIDIA GPU with at least 16GB of memory is necessary. The code has been verified on NVIDIA A6000 GPUs.

  • Software:

    • Conda is recommended for managing dependencies.
    • Python version 3.6 or higher is required.

    Create a new conda environment named BOOTPLACE and install the dependencies:

    conda env create --file=BOOTPLACE.yml
    

    Download DETR-R50 pretrained models for finetuning here and put it in the directory weights/detr-r50-e632da11.pth.

๐Ÿค– Pretrained Models

We provide the following pretrained models:

ModelDescription#ParamsDownload
BOOTPLACE_CityscapesMultiple supervision523MDownload

๐Ÿ“š Dataset

We provide a large-scale street-scene vehicle placement dataset Download curated from Cityscapes. The file structures are:

โ”œโ”€โ”€ train
    โ”œโ”€โ”€ backgrounds:
        โ”œโ”€โ”€ imgID.png
        โ”œโ”€โ”€ โ€ฆโ€ฆ
    โ”œโ”€โ”€ objects:
        โ”œโ”€โ”€ imgID:
            โ”œโ”€โ”€ object_name_ID.png
            โ”œโ”€โ”€ โ€ฆโ€ฆ
        โ”œโ”€โ”€ โ€ฆโ€ฆ
    โ”œโ”€โ”€ location:
        โ”œโ”€โ”€ imgID:
            โ”œโ”€โ”€ object_name_ID.txt
            โ”œโ”€โ”€ โ€ฆโ€ฆ
        โ”œโ”€โ”€ โ€ฆโ€ฆ
    โ”œโ”€โ”€ annotations.json
โ”œโ”€โ”€ test
    โ”œโ”€โ”€ backgrounds:
        โ”œโ”€โ”€ imgID.png
        โ”œโ”€โ”€ โ€ฆโ€ฆ
    |โ”€โ”€ backgrounds_single
        โ”œโ”€โ”€ imgID.png
        โ”œโ”€โ”€ โ€ฆโ€ฆ
    โ”œโ”€โ”€ objects:
        โ”œโ”€โ”€ imgID:
            โ”œโ”€โ”€ object_name_ID.png
            โ”œโ”€โ”€ โ€ฆโ€ฆ
        โ”œโ”€โ”€ โ€ฆโ€ฆ
    โ”œโ”€โ”€ objects_single:
        โ”œโ”€โ”€ imgID:
            โ”œโ”€โ”€ object_name_ID.png
            โ”œโ”€โ”€ โ€ฆโ€ฆ
        โ”œโ”€โ”€ โ€ฆโ€ฆ
    โ”œโ”€โ”€ location:
        โ”œโ”€โ”€ imgID:
            โ”œโ”€โ”€ object_name_ID.txt
            โ”œโ”€โ”€ โ€ฆโ€ฆ
        โ”œโ”€โ”€ โ€ฆโ€ฆ
    โ”œโ”€โ”€ location_single:
        โ”œโ”€โ”€ imgID:
            โ”œโ”€โ”€ object_name_ID.txt
            โ”œโ”€โ”€ โ€ฆโ€ฆ
        โ”œโ”€โ”€ โ€ฆโ€ฆ
    โ”œโ”€โ”€ annotations.json

Training

To train a model on Cityscapes:

python -m main \
    --epochs 200 \
    --batch_size 2 \
    --save_freq 10 \
    --set_cost_class 1 \
    --ce_loss_coef 1 \
    --num_queries 120 \
    --eos_coef 0.1 \
    --lr 1e-4 \
    --data_path data/Cityscapes \
    --output_dir results/Cityscapes_ckpt \
    --resume weights/detr-r50-e632da11.pth

Inference

python test.py \
    --num_queries 120 \
    --data_path data/Cityscapes \
    --pretrained_model 'results/Cityscapes_ckpt/checkpoint.pth' \
    --im_root 'data/Cityscapes/test' \
    --output_dir 'results/Cityscape_inference'

โš–๏ธ License

This project is licensed under the terms of the MIT license.

๐Ÿ“œ Citation

If you find this work helpful, please consider citing our paper:

@inproceedings{zhou2025bootplace,
  title={BOOTPLACE: Bootstrapped Object Placement with Detection Transformers},
  author={Zhou, Hang and Zuo, Xinxin and Ma, Rui and Cheng, Li},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={19294--19303},
  year={2025}
}