DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation
February 26, 2025 ยท View on GitHub
DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation
This repository is the official implementation of "DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation".
Update
[07/02/2024] Our DHR has been accepted to ECCV 2024. ๐ฅ๐ฅ๐ฅ
[04/02/2024] Released initial commits.
Citation
Please cite our paper if the code is helpful to your research.
@inproceedings{jo2024dhr,
title={DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation},
author={Sanghyun Jo and Fei Pan and In-Jae Yu and Kyungsu Kim},
booktitle={European Conference on Computer Vision (ECCV)},
year={2024}
}
Abstract
Weakly-supervised semantic segmentation (WSS) ensures high-quality segmentation with limited data and excels when employed as input seed masks for large-scale vision models such as Segment Anything. However, WSS faces challenges related to minor classes since those are overlooked in images with adjacent multiple classes, a limitation originating from the overfitting of traditional expansion methods like Random Walk. We first address this by employing unsupervised and weakly-supervised feature maps instead of conventional methodologies, allowing for hierarchical mask enhancement. This method distinctly categorizes higher-level classes and subsequently separates their associated lower-level classes, ensuring all classes are correctly restored in the mask without losing minor ones. Our approach, validated through extensive experimentation, significantly improves WSS across five benchmarks (VOC: 79.8%, COCO: 53.9%, Context: 49.0%, ADE: 32.9%, Stuff: 37.4%), reducing the gap with fully supervised methods by over 84% on the VOC validation set.

Setup
Setting up for this project involves installing dependencies and preparing datasets. The code is tested on Ubuntu 20.04 with NVIDIA GPUs and CUDA installed.
Installing dependencies
To install all dependencies, please run the following:
pip install -U "ray[default]"
pip install git+https://github.com/lucasb-eyer/pydensecrf.git
python3 -m pip install -r requirements.txt
or reproduce our results using docker.
docker build -t dhr_pytorch:v1.13.1 .
docker run --gpus all -it --rm \
--shm-size 32G --volume="$(pwd):$(pwd)" --workdir="$(pwd)" \
dhr_pytorch:v1.13.1
Preparing datasets
Please download following VOC, COCO, Context, ADE, and COCO-Stuff datasets. Each dataset has a different directory structure. Therefore, we modify directory structures of all datasets for a comfortable implementation.
1. PASCAL VOC 2012
Download PASCAL VOC 2012 dataset from our [Google Drive].
2. MS COCO 2014
Download MS COCO 2014 dataset from our [Google Drive].
3. Pascal Context
Download Pascal Context dataset from our [Google Drive].
4. ADE 2016
Download ADE 2016 dataset from our [Google Drive].
5. COCO-Stuff
Download COCO-Stuff dataset from our [Google Drive].
6. Open-vocabulary Segmentation Models
Download [all results] and [the reproduced project] for a fair comparison with WSS.
Create a directory "../VOC2012/" for storing the dataset and appropriately place each dataset to have the following directory structure.
../ # parent directory
โโโ ./ # current (project) directory
โ โโโ core/ # (dir.) implementation of our DHR (e.g., OT)
โ โโโ tools/ # (dir.) helper functions
โ โโโ experiments/ # (dir.) checkpoints and WSS masks
โ โโโ README.md # instruction for a reproduction
โ โโโ ... some python files ...
โ
โโโ WSS/ # WSS masks across all training and testing datasets
โ โโโ VOC2012/
โ โ โโโ RSEPM/
โ โ โโโ MARS/
โ โ โโโ DHR/
โ โโโ COCO2014/
โ โ โโโ DHR/
โ โโโ PascalContext/
โ โ โโโ DHR/
โ โโโ ADE2016/
โ โ โโโ DHR/
โ โโโ COCO-Stuff/
โ โโโ DHR/
โ
โโโ GroundingDINO_Ferret_SAM/ # reproduced project for Grounding DINO and Ferret with SAM
โ โโโ core/ # (dir.) implementation details
โ โโโ tools/ # (dir.) helper functions
โ โโโ weights/ # (dir.) checkpoints of Grounding DINO and Ferret
โ โโโ README.md # instruction for implementing Grounding DINO and Ferret
โ โโโ ... some python files ...
โ
โโโ OVSeg/ # SAM-based outputs of Grounding DINO and Ferret for a fair comparison
โ โโโ VOC2012/
โ โ โโโ GroundingDINO+SAM/
โ โ โโโ Ferret+SAM/
โ โโโ COCO2014/
โ โ โโโ GroundingDINO+SAM/
โ โ โโโ Ferret+SAM/
โ โโโ PascalContext/
โ โ โโโ GroundingDINO+SAM/
โ โ โโโ Ferret+SAM/
โ โโโ ADE2016/
โ โ โโโ GroundingDINO+SAM/
โ โ โโโ Ferret+SAM/
โ โโโ COCO-Stuff/
โ โโโ GroundingDINO+SAM/
โ โโโ Ferret+SAM/
โ
โโโ VOC2012/ # PASCAL VOC 2012
โ โโโ train_aug/
โ โ โโโ image/
โ โ โโโ mask/
โ โ โโโ xml/
โ โโโ validation/
โ โ โโโ image/
โ โ โโโ mask/
โ โ โโโ xml/
โ โโโ test/
โ โโโ image/
โ
โโโ COCO2014/ # MS COCO 2014
โ โโโ train/
โ โ โโโ image/
โ โ โโโ mask/
โ โ โโโ xml/
โ โโโ validation/
โ โโโ image/
โ โโโ mask/
โ โโโ xml/
โ
โโโ PascalContext/ # PascalContext
โ โโโ train/
โ โ โโโ image/
โ โ โโโ mask/
โ โ โโโ xml/
โ โโโ validation/
โ โโโ image/
โ โโโ mask/
โ โโโ xml/
โ
โโโ ADE2016/ # ADE2016
โ โโโ train/
โ โ โโโ image/
โ โ โโโ mask/
โ โ โโโ xml/
โ โโโ validation/
โ โโโ image/
โ โโโ mask/
โ โโโ xml/
โ
โโโ COCO-Stuff/ # COCO-Stuff
โโโ train/
โ โโโ image/
โ โโโ mask/
โ โโโ xml/
โโโ validation/
โโโ image/
โโโ mask/
โโโ xml/
Our WSS Masks on Five Benchmarks
We release our final DHR pseudo masks for VOC 2012, COCO 2014, Context, ADE, and Stuff across both train and validation sets.
You can download them from the link below:
๐ [Google Drive]
Preprocessing
1. Training the USS method
Please download the trained CAUSE weights from scratch on other datasets CAUSE weights. We follow the official CAUSE to train CAUSE from scratch on five datasets.
2. Training the WSS method
Please download and prepare WSS masks WSS labels. You can replace existing WSS methods with other WSS methods following the current structure.
Training
Our code is coming soon.
Evaluation
Release our checkpoint and official VOC results (anonymous links).
| Method | Backbone | Checkpoints | VOC val | VOC test |
|---|---|---|---|---|
| DHR | ResNet-101 | Google Drive | link | link |
Below lines are testing commands to reproduce our results. Additionally, we follow the official Mask2Former to train Swin-L+Mask2Former with our DHR masks on five datasets.
# Generate the final segmentation outputs with CRF
python3 produce_wss_masks.py --gpus 0 --cpus 64 --root ../ --data VOC2012 --domain validation \
--backbone resnet101 --decoder deeplabv3+ --tag "ResNet-101@VOC2012@DeepLabv3+@DHR" --checkpoint "last"
# Calculate the mIoU
python3 evaluate.py --fix --data VOC2012 --gt ../VOC2012/validation/mask/ \
--tag "DHR" --pred "./experiments/results/VOC2012/ResNet-101@VOC2012@DeepLabv3+@DHR@last/validation/"
# Reproduce WSS performance related to official VOC results
# DHR (Ours, DeepLabv3+) | mIoU: 79.6%, mFPR: 0.127, mFNR: 0.077
# DHR (Ours, Mask2Former) | mIoU: 81.7%, mFPR: 0.131, mFNR: 0.052
python3 evaluate.py --fix --data VOC2012 --gt ../VOC2012/validation/mask/ \
--tag "DHR (Ours, DeepLabv3+)" --pred "./submissions_DHR@DeepLabv3+/validation/results/VOC2012/Segmentation/comp5_val_cls/"
python3 evaluate.py --fix --data VOC2012 --gt ../VOC2012/validation/mask/ \
--tag "DHR (Ours, Mask2Former)" --pred "./submissions_DHR@Mask2Former/validation/results/VOC2012/Segmentation/comp5_val_cls/"