EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention

August 21, 2025 · View on GitHub

License PyTorch

This is the official PyTorch implementation of EViTs.

We draw inspiration from eagle vision and summarize a Bi-Fovea Visual Interaction (BFVI) structure based on the unique physiological and visual characteristics of eagle eyes. Based on this structural design approach, a novel Bi-Fovea Self-Attention (BFSA) and Bi-Fovea Feedforward Network (BFFN) are proposed. They are used to mimic how the biological visual cortex processes information hierarchically and in parallel, facilitating networks to learn feature representations of targets from coarse to fine. Furthermore, a Bionic Eagle Vision (BEV) block is designed as the basic building unit based on BFSA and BFFN. By stacking BEV blocks, a unified and efficient pyramid backbone network family called Eagle Vision Transformers (EViTs) are developed.

The overall pipeline of EViTs is illustrated in this figure.

EViT

Installation

Requirements

  • Linux with Python ≥ 3.6
  • PyTorch >= 1.8.1
  • timm >= 0.3.2
  • CUDA 11.1
  • An NVIDIA GPU

Conda environment setup

conda create -n EViT python=3.9
conda activate EViT

# Install Pytorch and TorchVision
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

pip install timm
pip install ninja
pip install tensorboard

# Install NVIDIA apex
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ../
rm -rf apex/

# Build other environments
pip install opencv-python==4.4.0.46 termcolor==1.1.0 yacs==0.1.8

Model Zoo

  • EViT on ImageNet-1K
MethodSizeAcc@1#Params (M)Download
EViT_Tiny22479.911.845M [Google] [BaiduNetdisk]
EViT_Small22482.624.995M [Google] [BaiduNetdisk]
EViT_Base22483.942.8164M [Google] [BaiduNetdisk]
EViT_Large22484.461.9237M [Google] [BaiduNetdisk]

Evaluation

To evaluate a pre-trained EViT-Tiny on ImageNet val with GPUs run:

python -m torch.distributed.run --nproc_per_node=8 --master_port 18875 train.py --eval True --model EViT_Tiny --datasets_path /home/ubuntu/Datasets/ImageNet --resume /home/ubuntu/Datasets/EViT-main/save_path/EViT_Tiny.pth

If you use this code for a paper please cite:

@article{shi2025evit,
  title={Evit: An eagle vision transformer with bi-fovea self-attention},
  author={Shi, Yulong and Sun, Mingwei and Wang, Yongshuai and Ma, Jiahao and Chen, Zengqiang},
  journal={IEEE Transactions on Cybernetics},
  year={2025},
  volume={55},
  number={3},
  pages={1288-1300},
  publisher={IEEE}
}