Motion-DeepLab
July 12, 2022 · View on GitHub
Motion-DeepLab is a unified model for the task of video panoptic segmentation, which requires to segment and track every pixel. It is built on top of Panoptic-DeepLab and uses an additional branch to regress each pixel to its center location in the previous frame. Instead of using a single RGB image as input, the network input contains two consecutive frames, i.e., the current and previous frame, as well as the center heatmap from the previous frame, similar to CenterTrack [1]. The output is used to assign consistent track IDs to all instances throughout a video sequence.
Prerequisite
-
Make sure the software is properly installed.
-
Make sure the target dataset is correctly prepared (e.g., KITTI-STEP).
-
Download the Cityscapes pretrained checkpoints listed below, and update the
initial_checkpointpath in the config files.
Model Zoo
KITTI-STEP Video Panoptic Segmentation
Initial checkpoint: We provide several Cityscapes pretrained checkpoints
for KITTI-STEP experiments. Please download them and update the
initial_checkpoint path in the config files.
| Model | Download | Note |
|---|---|---|
| Panoptic-DeepLab | initial_checkpoint | The initial checkpoint for single-frame baseline. |
| Motion-DeepLab | initial_checkpoint | The initial checkpoint for two-frame baseline. |
We also provide checkpoints pretrained on KITTI-STEP below. If you would like to train those models by yourself, please find the corresponding config files under the directories configs/kitti/panoptic_deeplab (single-frame-baseline) or configs/kitti/motion_deeplab (two-frame-baseline).
Panoptic-DeepLab (single-frame-baseline):
| Backbone | Output stride | Dataset split | PQ† | APMask† | mIoU |
|---|---|---|---|---|---|
| ResNet-50 (config, ckpt) | 32 | KITTI-STEP train set | 48.31 | 42.22 | 71.16 |
| ResNet-50 (config, ckpt) | 32 | KITTI-STEP trainval set | - | - | - |
†: See Q4 in FAQ.
This single-frame baseline could be used together with other state-of-the-art optical flow methods (e.g., RAFT [2]) for propagating mask predictions from one frame to another, as shown in our STEP paper.
Motion-DeepLab (two-frame-baseline):
| Backbone | Output stride | Dataset split | PQ† | APMask† | mIoU | STQ |
|---|---|---|---|---|---|---|
| ResNet-50 (config, ckpt) | 32 | KITTI-STEP train set | 42.08 | 37.52 | 63.15 | 57.7 |
| ResNet-50 (config, ckpt) | 32 | KITTI-STEP trainval set | - | - | - | - |
†: See Q4 in FAQ.
MOTChallenge-STEP Video Panoptic Segmentation
Initial checkpoint: We provide several Cityscapes pretrained checkpoints
for MOTChallenge-STEP experiments. Please download them and update the
initial_checkpoint path in the config files.
| Model | Download | Note |
|---|---|---|
| Panoptic-DeepLab | initial_checkpoint | The initial checkpoint for single-frame baseline. |
| Motion-DeepLab | initial_checkpoint | The initial checkpoint for two-frame baseline. |
We also provide checkpoints pretrained on MOTChallenge-STEP below. If you would like to train those models by yourself, please find the corresponding config files under the directories for configs/motchallenge/panoptic_deeplab (single-frame-baseline) or configs/motchallenge/motion_deeplab (two-frame-baseline).
Panoptic-DeepLab (single-frame-baseline):
TODO: Add pretrained checkpoint.
| Backbone | Output stride | Dataset split | PQ† | APMask† | mIoU |
|---|---|---|---|---|---|
| ResNet-50 (config) | 32 | MOTChallenge-STEP train set | ? | ? | ? |
| ResNet-50 | 32 | MOTChallenge-STEP trainval set | - | - | - |
†: See Q4 in FAQ.
This single-frame baseline could be used together with other state-of-the-art optical flow methods (e.g., RAFT [2]) for propagating mask predictions from one frame to another, as shown in our STEP paper.
Motion-DeepLab (two-frame-baseline):
TODO: Add pretrained checkpoint.
| Backbone | Output stride | Dataset split | PQ† | APMask† | mIoU | STQ |
|---|---|---|---|---|---|---|
| ResNet-50 (config) | 32 | MOTChallenge-STEP train set | ? | ? | ? | ? |
| ResNet-50 | 32 | MOTChallenge-STEP trainval set | - | - | - | - |
†: See Q4 in FAQ.
Citing Motion-DeepLab
If you find this code helpful in your research or wish to refer to the baseline results, please use the following BibTeX entry.
- STEP (Motion-DeepLab):
@article{step_2021,
author = {Weber, Mark and Xie, Jun and Collins, Maxwell and Zhu, Yukun and Voigtlaender, Paul and Adam, Hartwig and Green, Bradley and Geiger, Andreas and Leibe, Bastian and Cremers, Daniel and O\v{s}ep, Aljo\v{s}a and Leal-Taix\'{e}, Laura and Chen, Liang-Chieh},
journal = {Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks},
title = {{STEP}: Segmenting and Tracking Every Pixel},
year = {2021}
}
References
-
Xingyi Zhou, Vladlen Koltun, and Philipp Krahenbuhl. Tracking objects as points. ECCV, 2020
-
Zachary Teed and Jia Deng. RAFT: recurrent all-pairs field transforms for optical flow. In ECCV, 2020