EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
August 21, 2025 · View on GitHub
This is the official PyTorch implementation of EViTs.
We draw inspiration from eagle vision and summarize a Bi-Fovea Visual Interaction (BFVI) structure based on the unique physiological and visual characteristics of eagle eyes. Based on this structural design approach, a novel Bi-Fovea Self-Attention (BFSA) and Bi-Fovea Feedforward Network (BFFN) are proposed. They are used to mimic how the biological visual cortex processes information hierarchically and in parallel, facilitating networks to learn feature representations of targets from coarse to fine. Furthermore, a Bionic Eagle Vision (BEV) block is designed as the basic building unit based on BFSA and BFFN. By stacking BEV blocks, a unified and efficient pyramid backbone network family called Eagle Vision Transformers (EViTs) are developed.
The overall pipeline of EViTs is illustrated in this figure.

Installation
Requirements
- Linux with Python ≥ 3.6
- PyTorch >= 1.8.1
- timm >= 0.3.2
- CUDA 11.1
- An NVIDIA GPU
Conda environment setup
conda create -n EViT python=3.9
conda activate EViT
# Install Pytorch and TorchVision
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install timm
pip install ninja
pip install tensorboard
# Install NVIDIA apex
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ../
rm -rf apex/
# Build other environments
pip install opencv-python==4.4.0.46 termcolor==1.1.0 yacs==0.1.8
Model Zoo
- EViT on ImageNet-1K
| Method | Size | Acc@1 | #Params (M) | Download |
|---|---|---|---|---|
| EViT_Tiny | 224 | 79.9 | 11.8 | 45M [Google] [BaiduNetdisk] |
| EViT_Small | 224 | 82.6 | 24.9 | 95M [Google] [BaiduNetdisk] |
| EViT_Base | 224 | 83.9 | 42.8 | 164M [Google] [BaiduNetdisk] |
| EViT_Large | 224 | 84.4 | 61.9 | 237M [Google] [BaiduNetdisk] |
Evaluation
To evaluate a pre-trained EViT-Tiny on ImageNet val with GPUs run:
python -m torch.distributed.run --nproc_per_node=8 --master_port 18875 train.py --eval True --model EViT_Tiny --datasets_path /home/ubuntu/Datasets/ImageNet --resume /home/ubuntu/Datasets/EViT-main/save_path/EViT_Tiny.pth
If you use this code for a paper please cite:
@article{shi2025evit,
title={Evit: An eagle vision transformer with bi-fovea self-attention},
author={Shi, Yulong and Sun, Mingwei and Wang, Yongshuai and Ma, Jiahao and Chen, Zengqiang},
journal={IEEE Transactions on Cybernetics},
year={2025},
volume={55},
number={3},
pages={1288-1300},
publisher={IEEE}
}