Segmenting Moving Objects via an Object-Centric Layered Representation

March 1, 2023 · View on GitHub

Junyu Xie, Weidi Xie, Andrew Zisserman

Visual Geometry Group, Department of Engineering Science, University of Oxford

In NeurIPS, 2022.

Requirements

python=3.8.8, pytorch=1.9.1, Pillow, opencv, einops (for tensor manipulation), tensorboardX (for data logging)

Dataset preparation

DAVIS2016 can be used directly after download.
For DAVIS2017-motion, RGB sequences are the same as those within the DAVIS2017 dataset. The curated annotations can be downloaded from here.
Other datasets such as SegTrackv2, FBMS-59 and MoCA_filter are required to be preprocessed. We follow the same preprocessing protocol in motiongrouping.
Synthetic datasets (Syn-Train and Syn-Val) can be downloaded from here. (Modal annotations are not provided, as they can be generated from amodal annotations during dataloading).

Optical flows are estimated by RAFT method. Flow estimation codes are also provided in flow folder.

Once finished, in config.py, modify dataset paths in setup_dataset and set corresponding logging paths in setup_path.

To setup your own data:

Add you own dataset information in setup_dataset in config.py .
Add you dataset name to the choices in parser.add_argument('--dataset') in train.py and eval.py
Add colour palette information for input/output annotations to data/colour_palette.json

Training

python train.py --queries 3 --gaps 1,-1 --batch_size 2 --frames 30 --dataset Syn

The flow-only OCLR model pretrained on our synthetic dataset (Syn-train) can be downloaded from here.

Inference

python eval.py --queries 3 --gaps 1,-1 --batch_size 1 --frames 30 --dataset DAVIS17m \
               --resume_path /path/to/ckpt --save_path /path/to/savepath

where --resume_path indicates the checkpoint path, and --save_path corresponds to the saving path of segmentation results.

Our segmentation results on several datasets (DAVIS2016, DAVIS2017-motion, SegTrackv2, FBMS-59, MoCA) can be also found here.

Evaluation benchmarks:

For DAVIS2016, use the DAVIS2016 official evaluator.
For DAVIS2017-motion, once our curated annotations are downloaded from here, simply replace Annotations_unsupervised folder in the DAVIS2017 dataset. Then, DAVIS2017 official evaluator can be used to evaluate the unsupervised VOS performance.
For MoCA, use the evaluator provided in motiongrouping.

Test-time adaptation

The test-time adaptation process refines flow-predicted masks by a RGB-based mask propagation process based on DINO features. More information can be found in dino folder.

Citation

If you find the code helpful in your research, please consider citing our work:

@inproceedings{xie2022segmenting,
    title     = {Segmenting Moving Objects via an Object-Centric Layered Representation}, 
    author    = {Junyu Xie and Weidi Xie and Andrew Zisserman},
    booktitle = {Advances in Neural Information Processing Systems},
    year      = {2022}
}