Modeling Image Composition for Complex Scene Generation
July 17, 2023 · View on GitHub
Official PyTorch implementation of of TwFA.
Modeling Image Composition for Complex Scene Generation (CVPR2022)
Zuopeng Yang, Daqing Liu, Chaoyue Wang, Jie Yang, Dacheng Tao
Overview
The overview of the proposed Transformer with Focal Attention (TwFA) framework.

The illustration of different attention mechanisms with connectivity matrix.

Requirements
A suitable conda environment named twfa can be created
and activated with:
conda env create -f environment.yaml
conda activate twfa
Data Preparation
COCO
Create a symlink data/coco containing the images from the 2017 split in
train2017 and val2017, and their annotations in annotations. Files can be
obtained from the COCO webpage.
VG
Create a symlink data/vg containing the images from Visual Genome. Files can be
obtained from the VG webpage. Unzip the other annotations for VG in the dir data.
Sampling
COCO
Download the checkpoint (code: 5ipt) and place it into the dir pretrained/checkpoints. Then run the command:
python scripts/sample_coco.py --base configs/coco.yaml --save_path SAVE_DIR
VG
Download the checkpoint1 (code: 1gzu) or checkpoint2 (code: t1qv) and place it into the dir pretrained/checkpoints. Then run the command:
python scripts/sample_vg.py --base configs/VG_CONFIG_FILE --save_path SAVE_DIR
Training models
COCO
python main.py --base configs/coco.yaml -t True --gpus 0,1,2,3,4,5,6,7,
VG
python main.py --base configs/vg.yaml -t True --gpus 0,1,2,3,4,5,6,7,
Results
Compare different models

Acknowledgement
Huge thanks to the Taming-Transformers!
@misc{esser2020taming,
title={Taming Transformers for High-Resolution Image Synthesis},
author={Patrick Esser and Robin Rombach and Björn Ommer},
year={2020},
eprint={2012.09841},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
BibTeX
@inproceedings{yang2022modeling,
title={Modeling image composition for complex scene generation},
author={Yang, Zuopeng and Liu, Daqing and Wang, Chaoyue and Yang, Jie and Tao, Dacheng},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7764--7773},
year={2022}
}