GETTING_STARTED.md
March 24, 2024 ยท View on GitHub
How to run the code
Run python main.py +experiment=dit for experiments using DiT as the diffusion backbone. Here are the arguments which you can adapt for different datasets and hyper-parameters:
- Classifier backbone: set
model.class_archtoconvnext_large/convnext_tiny/resnet18/vit_b_32/vit_b_16/vit_l_14for ImageNet-trained classifiers - Larger DiT: use
+experiment=ditand setinput.sd_img_res=512 - Optimizer: set
tta.gradient_descent.optimizertoadam/sgd - Learning reate: set
tta.gradient_descent.base_learning_rateto any numerical values - Dataset: set
input.dataset_nametoImageNetDataset/ImageNetv2Dataset/ImageNetRDataset/ImageNetCDataset/ImageNetADataset/ImageNetStyleDataset. - Total batch size: set
input.batch_sizeandtta.gradient_descent.accum_iterwhere the total batch size is the multiplication of these two parameters
How to improve performance
Empirically, we found that using larger total batch size results in more stable classification improvement. However, it takes longer time for TTA with larger batch size. Also, we found that some backbones are better with sgd optimizer than adam optimizer.
Commands to get started
Clip/Stable Diffusion
Single Sample TTA on FGVC-Aircraft and other datasets
``` python main.py +experiment=sd model.class_arch=clipb32 input.dataset_name=FGVCAircraftSubset ```ConvNext-Large/DiT
ConvNext-Tiny works better with adam optimizer
Online TTA on ImageNet-C
python main.py +experiment=dit model.class_arch=convnext_large input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetCDataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam tta.online=True input.subsample=1 log_freq=1
ConvNext-Tiny/DiT
ConvNext-Tiny works better with adam optimizer
Single-sample TTA on ImageNet-R
python main.py +experiment=dit model.class_arch=convnext_tiny input.batch_size=12 tta.gradient_descent.accum_iter=15 input.dataset_name=ImageNetRDataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam
Single-sample TTA on ImageNet-C
python main.py +experiment=dit model.class_arch=convnext_tiny input.batch_size=20 tta.gradient_descent.accum_iter=9 input.dataset_name=ImageNetCDataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam input.subsample=null
Single-sample TTA on ImageNet-A
python main.py +experiment=dit model.class_arch=convnext_tiny input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetADataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam input.subsample=null
Single-sample TTA on ImageNet-v2
python main.py +experiment=dit model.class_arch=convnext_tiny input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetv2Dataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam
Single-sample TTA on ImageNet
python main.py +experiment=dit model.class_arch=convnext_tiny input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetDataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam
ResNet-18/DiT
ResNet-18 works better with adam optimizer
Single-sample TTA on ImageNet-R
python main.py +experiment=dit model.class_arch=resnet18 input.batch_size=12 tta.gradient_descent.accum_iter=15 input.dataset_name=ImageNetRDataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam
Single-sample TTA on ImageNet-C
python main.py +experiment=dit model.class_arch=resnet18 input.batch_size=20 tta.gradient_descent.accum_iter=9 input.dataset_name=ImageNetCDataset tta.gradient_descent.base_learning_rate=5e-3 tta.gradient_descent.optimizer=sgd
Single-sample TTA on ImageNet-A
python main.py +experiment=dit model.class_arch=resnet18 input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetADataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam input.subsample=null
Single-sample TTA on ImageNet-v2
python main.py +experiment=dit model.class_arch=resnet18 input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetv2Dataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam
Single-sample TTA on ImageNet
python main.py +experiment=dit model.class_arch=resnet18 input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetDataset tta.gradient_descent.base_learning_rate=1e-5 tta.gradient_descent.optimizer=adam
ViT-B-32/DiT
ViT-B-32 works better with sgd optimizer
Single-sample TTA on ImageNet-R
python main.py +experiment=dit model.class_arch=vit_b_32 input.batch_size=20 tta.gradient_descent.accum_iter=9 input.dataset_name=ImageNetRDataset tta.gradient_descent.base_learning_rate=5e-3 tta.gradient_descent.optimizer=sgd
Single-sample TTA on ImageNet-C
* ImageNet-C ``` python main.py +experiment=dit model.class_arch=vit_b_32 input.batch_size=20 tta.gradient_descent.accum_iter=9 input.dataset_name=ImageNetCDataset tta.gradient_descent.base_learning_rate=5e-3 tta.gradient_descent.optimizer=sgd ```Single-sample TTA on ImageNet-A
python main.py +experiment=dit model.class_arch=vit_b_32 input.batch_size=20 tta.gradient_descent.accum_iter=9 input.dataset_name=ImageNetADataset tta.gradient_descent.base_learning_rate=5e-3 tta.gradient_descent.optimizer=sgd input.subsample=null
Single-sample TTA on ImageNet-v2
python main.py +experiment=dit model.class_arch=vit_b_32 input.batch_size=15 tta.gradient_descent.accum_iter=12 input.dataset_name=ImageNetv2Dataset tta.gradient_descent.base_learning_rate=5e-3 tta.gradient_descent.optimizer=sgd
Single-sample TTA on ImageNet
python main.py +experiment=dit model.class_arch=vit_b_32 input.batch_size=20 tta.gradient_descent.accum_iter=9 input.dataset_name=ImageNetDataset tta.gradient_descent.base_learning_rate=5e-3 tta.gradient_descent.optimizer=sgd