README.md
February 3, 2026 · View on GitHub
Debiased Orthogonal Boundary-Driven Efficient Noise Mitigation
The official implementation of the paper "Debiased Orthogonal Boundary-Driven Efficient Noise Mitigation".
Requirements
We recommend the following dependencies.
- Python 3.8
- PyTorch 1.13.1
Then, please install other environment dependencies through:
pip install -r requirements.txt
The recommended GPU memory is at least 24 GB.
⚙️ Dataset Preparation
Annotation Preparation
We follow the same split provided by NPC.
You should first download and extract the formulated datasets directory, which contains all the annotation files. You can access them here.
Our formulated datasets directory tree looks like:
${DATASETS}/
├── MSCOCO/
│ ├── annotations/
│ │ ├── 0.0_clean_index.npy
│ │ ├── 0.0_noise_train_caps.txt
│ │ ├── 0.2_clean_index.npy
│ │ ├── ...
│ │ ├── train_caps.txt
│ │ ├── train_ids.txt
│ │ ├── dev_caps.txt
│ │ └── ...
│ └── images/ # empty
│
├── Flickr30K/
│ ├── annotations
│ │ ├── 0.0_clean_index.npy
│ │ ├── 0.0_noise_train_caps.txt
│ │ ├── 0.2_clean_index.npy
│ │ ├── ...
│ │ ├── train_caps.txt
│ │ ├── train_ids.txt
│ │ ├── dev_caps.txt
│ │ └── ...
│ └── images/
│
└── CC120K/
├── annotations
│ ├── train_caps.txt
│ ├── train_ids.txt
│ ├── dev_caps.txt
│ └── ...
└── images/ # empty
Image Preparation
The images directory are still empty. You need to download the images separately and place them directly in the corresponding dataset's images folder. We adopt the same image download and processing method as NPC.
Download Link
- MSCOCO. We unified the images' name format of the MSCOCO dataset for easier use. You can use
util.pyto rename the images in MSCOCO. - Flickr30K.
- CC120K. You can download the dataset from this link with the extraction code "3ble".
🔥 Training
The following are the training instructions for various datasets. Please set the ${DATASETS} to previously configured datasets folder. You can specify the ${SAVE_PATH}$ to the model path you'd like to save to.
Training on MS-COCO:
You can adjust the noise ratio in the training set by changing ${NOISE_RATIO}$, which can be selected from the following values: [0.0, 0.2, 0.4, 0.5, 0.6].
python main_clip.py --batch_size 256 --epochs 5 --lr 1e-5 --warmup 500 --vision_model ViT-B/32 --dataset coco --dataset_root ${DATASETS}$/MSCOCO --checkpoint_path ${SAVE_PATH}$ --noise_ratio ${NOISE_RATIO}$
Training on Flickr30K:
You can adjust the noise ratio in the training set by changing ${NOISE_RATIO}$, which can be selected from the following values: [0.0, 0.2, 0.4, 0.6].
python main_clip.py --batch_size 256 --epochs 5 --lr 1e-5 --warmup 500 --vision_model ViT-B/32 --dataset f30k --dataset_root ${DATASETS}$/Flickr30K --checkpoint_path ${SAVE_PATH}$ --noise_ratio ${NOISE_RATIO}$
Training on CC120K:
CC120K is a real-world noisy dataset, so the noise ratio is not need to be specified.
python main_clip.py --batch_size 256 --epochs 10 --lr 1e-5 --warmup 500 --vision_model ViT-B/32 --dataset cc --dataset_root ${DATASETS}$/CC120K --checkpoint_path ${SAVE_PATH}$
📋 Evaluation
Experimental Results
Evaluation Instuctions
You can evaluate your trained model by executing the following commands. Please set the ${DATASETS} to previously configured datasets folder. ${MODEL_PATH}$ is the model to be evaluated.
Evaluation on MSCOCO:
python main_clip.py --eval --vision_model ViT-B/32 --dataset coco --dataset_root ${DATASETS}$/MSCOCO --resume ${MODEL_PATH}$
Evaluation on Flickr30K:
python main_clip.py --eval --vision_model ViT-B/32 --dataset f30k --dataset_root ${DATASETS}$/Flickr30K --resume ${MODEL_PATH}$
Evaluation on CC120K:
python main_clip.py --eval --vision_model ViT-B/32 --dataset cc --dataset_root ${DATASETS}$/CC120K --resume ${MODEL_PATH}$
Reference
If you found this work is useful for you, we appreciate that if you can cite the following paper:
@inproceedings{OSA,
author = {Hao Li and
Jiayang Gu and
Jingkuan Song and
An Zhang and
Lianli Gao},
title = {Debiased Orthogonal Boundary-Driven Efficient Noise Mitigation},
journal = {arXiv preprint: 2410.01944},
year = {2024}
}