Adversarial Training + Rotational Invariance of Transformers

May 15, 2022 · View on GitHub

Introduction

We provide evidence against the unexpected trend of Vision Transformers (ViT) being not perceptually aligned with human visual representations by showing how a dual-stream Transformer (CrossViT) under a joint rotationally-invariant and adversarial optimization procedure yields 2nd place in the aggregate Brain-Score 2022 competition averaged across all visual categories, and currently (March 1st, 2022) holds the 1st place for the highest explainable variance of area V4. Against our initial expectations, these results provide tentative support for an ''All roads lead to Rome'' argument enforced via a joint optimization rule even for non biologically-motivated models of vision such as Vision Transformers

For more details please see our BSW 2022 paper.

Setup

  1. Install Python (>=3.7), PyTorch and other required python libraries with:
    pip install -r requirements.txt
    
  2. Download Imagenet dataset and valprep.sh for preparing validation set:
    mkdir -p ./Dataset
    # Unzip data inside "Dataset"
    cd ./Dataset/val
    bash valprep.sh
    

Usage

  • Generate or choose a config file from "Configs" folder and run the experiments:
python -u train_adv.py --data path/to/dataset --config path/to/config.yaml

Pretrained weights

IDDescriptionVal. Acc(%)AvgV1V2V4ITBehavior
1057CrossViT-18†83.050.4420.4730.2740.4780.4840.500
1095CrossViT-18†+Rotation79.220.4580.4580.2880.4950.5030.547
1084CrossViT-18†+Adv64.600.4620.4970.3430.5080.5190.441
991CrossViT-18†+Rotation+Adv73.530.4880.4930.3420.5140.5310.562

Citation

If you find this useful for your work, please consider citing

@inproceedings{
berrios2022joint,
title={Joint rotational invariance and adversarial training of a dual-stream Transformer yields state of the art Brain-Score for Area V4},
author={William Berrios and Arturo Deza},
booktitle={Brain-Score Workshop},
year={2022},
url={https://openreview.net/forum?id=SOulrWP-Xb5}
}