ClusterNet
January 24, 2024 · View on GitHub
This is an official implementation of "Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering" (IEEE TCSVT).
Prerequisites
The training and testing experiments are conducted using PyTorch 1.8.1 with a single NVIDIA TITAN RTX GPU with 24GB Memory.
- python 3.8
- pytorch 1.8.1
- torchvision 0.9.1
conda create -n ClusterNet python=3.8
conda activate ClusterNet
conda install pytorch==1.8.1 torchvision==0.9.1 cudatoolkit=10.2 -c pytorch
Other minor Python modules can be installed by running
pip install opencv-python einops
Datasets
- DAVIS16: We perform online clustering and evaluation on the validation set. However, please download DAVIS17 (Unsupervised 480p) to fit the code.
- FBMS: This dataset contains videos of multiple moving objects, providing test cases for multiple object segmentation.
- SegTrackV2: Each sequence contains 1-6 moving objects.
Following the evaluation protocol in CIS, we combine multiple objects as a single foreground and use the region similarity to measure the segmentation performance for the FBMS and SegTrackV2. Binary Mask: [FBMS][SegTrackV2]
- Path configuration: Dataset path settings is
--data_dirinmain.py.
parser.add_argument('--data_dir', default=None, type=str, help='dataset root dir')
- The datasets directory structure will be as follows:
|--DAVIS2017
| |--Annotations_unsupervised
| | |--480p
| |--ImageSets
| | |--2016
| |--Flows_gap_1_${flow_method}
| |--Full-Resolution
|--FBMS
| |--Annotations_Binary
| |--Flows_gap_1_${flow_method}
|--SegTrackv2
|--Annotations_Binary
|--Flows_gap_1_${flow_method}
Precompute optical flow
-
The optical flow is estimated by using the PWCNet, RAFT and FlowFormer. In datasets directory, the variable
flow_methodisPWC,RAFTandFlowFormer, respectively. -
The flows are resized to the size of the original image (same as Motion Grouping), with each input frame having a size of $480\times854 for the *DAVIS<sub>16</sub>* and \480\times640[-1, 1]$, and use only the previous frames for the optical flow estimation in the online setting.
Train & Inference
To train the ClusterNet model on a GPUs, you can use:
bash scripts/main.sh
In the main.sh file, first activate your Python environment and set gpu_id and data_dir. Then set the hyperparameters batch_size, n_clusters, and threshold to 16, 30, and 0.1, respectively.
Outputs
The model files and checkpoints will be saved in ./checkpoints/${exp_id}.
.pth files with _${sequence_name} store the network weights that initialize our autoencoder to train on DAVIS16 through the loss of optical flow reconstruction.
The segmentation results will be saved in ./results/${exp_id}. The evaluation criterion is the mean region similarity .
| Optical flow prediction | Method | Mean |
|---|---|---|
| PWC-Net | MG ClusterNet | 63.7 67.9(+4.2) |
| RAFT | MG ClusterNet | 68.3 72.0(+3.7) |
| FlowFormer | MG ClusterNet | 70.3 75.4(+5.1) |
Citation
If you find our work useful in your research please consider citing our paper!
@ARTICLE{ClusterNet,
author={Xi, Lin and Chen, Weihai and Wu, Xingming and Liu, Zhong and Li, Zhengguo},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
title={Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering},
year={2023}
}
Contact
If you have any questions, please feel free to contact Lin Xi (xilin1991@buaa.edu.cn).
Acknowledgement
This project would not have been possible without relying on some awesome repos: Motion Grouping, PWCNet, RAFT and FlowFormer. We thank the original authors for their excellent work.