Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

July 5, 2024 · View on GitHub

COMBO-AVS

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Qi Yang, Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan and Shiming Xiang

This repository provides the PyTorch implementation for the paper "Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation" accepted by CVPR 2024 (Highlight).

🔥What's New

(2024. 4.06) Our paper(COMBO) is marked as Highlight Paper! 😮
(2024. 3.19) Our checkpoints are available to the public, looking from YannQi/COMBO-AVS-checkpoints · Hugging Face!
(2024. 3.14) Our code is available to the public in $\pi$ day!
(2024. 3.12) Our code is ready to share for the public 🌲🌲🌲!
(2024. 2.27) Our paper(COMBO) is accepted by CVPR 2024!
(2023.11.17) We completed the implemention of COMBO and push the code.

🪵 TODO List

Upload the pre-masks and the checkpoints at the YannQi/COMBO-AVS-checkpoints · Hugging Face!

Please refer to the link AVSBenchmark to download the datasets. You can put the data under data folder or rename your own folder. Remember to modify the path in config files. The data directory is as bellow:

|--AVS_dataset
   |--AVSBench_semantic/
   |--AVSBench_object/Multi-sources/
   |--AVSBench_object/Single-source/

Preprocess the AVSS dataset for efficient training.

python3 avs_tools/preprocess_avss_audio.py
python3 avs_tools/process_avssimg2fixsize.py

3. Download Pre-Trained Models

The pretrained backbone is available from benchmark AVSBench pretrained backbones YannQi/COMBO-AVS-checkpoints · Hugging Face.

|--pretrained
   |--detectron2/R-50.pkl
   |--detectron2/d2_pvt_v2_b5.pkl
   |--vggish-10086976.pth
   |--vggish_pca_params-970ea276.pth

4. Maskiges pregeneration

Generate class-agnostic masks (Optional)

sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh train # or ms3, avss
sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh val 
sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh test

Generate Maskiges (Optional)

python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split train # or ms3, avss
python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split val
python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split test

Move Maskiges to the following folder Note: For convenience, we provide pre-generated Maskiges for S4\MS3\AVSS subset on the YannQi/COMBO-AVS-checkpoints · Hugging Face.

|--AVS_dataset
    |--AVSBench_semantic/pre_SAM_mask/
    |--AVSBench_object/Multi-sources/ms3_data/pre_SAM_mask/
    |--AVSBench_object/Single-source/s4_data/pre_SAM_mask/

5. Train

# ResNet-50
sh scripts/res_train_avs4.sh # or ms3, avss

# PVTv2
sh scripts/pvt_train_avs4.sh # or ms3, avss

6. Test

# ResNet-50
sh scripts/res_test_avs4.sh # or ms3, avss

# PVTv2
sh scripts/pvt_test_avs4.sh # or ms3, avss

7. Results and Download Links

We provide the checkpoints of the S4 Subset at YannQi/COMBO-AVS-checkpoints · Hugging Face.

Method	Backbone	Subset	Config	mIoU	F-score
COMBO-R50	ResNet-50	S4	config	81.7	90.1
COMBO-PVTv2	PVTv2-B5	S4	config	84.7	91.9
COMBO-R50	ResNet-50	MS3	config	54.5	66.6
COMBO-PVTv2	PVTv2-B5	MS3	config	59.2	71.2
COMBO-R50	ResNet-50	AVSS	config	33.3	37.3
COMBO-PVTv2	PVTv2-B5	AVSS	config	42.1	46.1

🤝 Citing COMBO

@misc{yang2023cooperation,
      title={Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation},
      author={Qi Yang and Xing Nie and Tong Li and Pengfei Gao and Ying Guo and Cheng Zhen and Pengfei Yan and Shiming Xiang},
      year={2023},
      eprint={2312.06462},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

COMBO-AVS

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

🔥What's New

🪵 TODO List

📚Method

Overview of the proposed COMBO.

🛠️ Getting Started

1. Environments

2. Datasets

3. Download Pre-Trained Models

4. Maskiges pregeneration

5. Train

6. Test

7. Results and Download Links

🤝 Citing COMBO