Dropout Reduces Underfitting
May 6, 2023 ยท View on GitHub
Official PyTorch implementation for Dropout Reduces Underfitting
Dropout Reduces Underfitting, ICML 2023
Zhuang Liu*, Zhiqiu Xu*, Joseph Jin, Zhiqiang Shen, Trevor Darrell (* equal contribution)
Meta AI, UC Berkeley and MBZUAI
Figure: We propose early dropout and late dropout. Early dropout helps underfitting models fit the data better and achieve lower training loss. Late dropout helps improve the generalization performance of overfitting models.
Results on ImageNet-1K
Model weights are released as links on results.
Early Dropout
results with basic recipe (s.d. = stochastic depth)
| model | ViT-T | Mixer-S | Swin-F | ConvNeXt-F |
|---|---|---|---|---|
| no dropout | 73.9 | 71.0 | 74.3 | 76.1 |
| standard dropout | 67.9 | 67.1 | 71.6 | - |
| standard s.d. | 72.6 | 70.5 | 73.7 | 75.5 |
| early dropout | 74.3 | 71.3 | 74.7 | - |
| early s.d. | 74.4 | 71.7 | 75.2 | 76.3 |
results with improved recipe
| model | ViT-T | Swin-F | ConvNeXt-F |
|---|---|---|---|
| no dropout | 76.3 | 76.1 | 77.5 |
| standard dropout | 71.5 | 73.5 | - |
| standard s.d. | 75.6 | 75.6 | 77.4 |
| early dropout | 76.7 | 76.6 | - |
| early s.d. | 76.7 | 76.6 | 77.7 |
Late Dropout
results with basic recipe
| model | ViT-B | Mixer-B |
|---|---|---|
| standard s.d. | 81.6 | 78.0 |
| late s.d. | 82.3 | 78.6 |
Installation
Please check INSTALL.md for installation instructions.
Training
Basic Recipe
We list commands for early dropout, early stochastic depth on ViT-T and late stochastic depth on ViT-B.
- For training other models, change
--modelaccordingly, e.g., tovit_tiny,mixer_s32,convnext_femto,mixer_b16,vit_base. - Our results were produced with 4 nodes, each with 8 gpus. Below we give example commands on both multi-node and single-machine setups.
Early dropout
multi-node
python run_with_submitit.py --nodes 4 --ngpus 8 \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 1 \
--dropout 0.1 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
single-machine
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--dropout 0.1 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Early stochastic depth
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--drop_path 0.5 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Late stochastic depth
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_base --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--drop_path 0.4 --drop_mode late --drop_schedule constant --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Standard dropout / no dropout (replace $p with 0.1 / 0.0)
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 300 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--dropout $p --drop_mode standard \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Improved Recipe
Our improved recipe extends training epochs from 300 to 600, and reduces both mixup and cutmix to 0.3.
Early dropout
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 600 --mixup 0.3 --cutmix 0.3 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--dropout 0.1 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Early stochastic depth
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --epochs 600 --mixup 0.3 --cutmix 0.3 \
--batch_size 128 --lr 4e-3 --update_freq 4 \
--drop_path 0.5 --drop_mode early --drop_schedule linear --cutoff_epoch 50 \
--data_path /path/to/data/ \
--output_dir /path/to/results/
Evaluation
single-GPU
python main.py --model vit_tiny --eval true \
--resume /path/to/model \
--data_path /path/to/data
multi-GPU
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model vit_tiny --eval true \
--resume /path/to/model \
--data_path /path/to/data
Acknowledgement
This repository is built using the timm library and ConvNeXt codebase.
License
This project is released under the CC-BY-NC 4.0 license. Please see the LICENSE file for more information.
Citation
If you find this repository helpful, please consider citing:
@inproceedings{liu2023dropout,
title={Dropout Reduces Underfitting},
author={Zhuang Liu, Zhiqiu Xu, Joseph Jin, Zhiqiang Shen, Trevor Darrell},
year={2023},
booktitle={International Conference on Machine Learning},
}