README.md

November 3, 2025 · View on GitHub

🚀 PiDiViT: When Pixel Difference Patterns Meet ViT for Few-Shot Object Detection (ICCV 2025)

🔥We propose PiDiViT, which empowers pretrained ViT to excel in few-shot detection by designing explicit prior modules for pixel-wise differences and multiscale variations in low-level features of pretrained ViT.

🔥PiDiViT achieves SOTA performance in COCO for few-shot, one-shot, and open-vocabulary object detection, setting new benchmarks and offering a valuable reference for future detecting few-shot objects.

🛠️ Updates

  • (11/2025) Code released with full training/evaluation scripts.
  • (10/2025) Official publication in ICCV 2025 proceedings (paper: download).
  • (06/2025) PiDiViT accepted at ICCV 2025.

🕸️ Dataset and Model Initialization Checkpoints

You can download them from the baseline project DE-ViT.

📽️ Getting Started

Installation

git clone https://github.com/Seaz9/PiDiViT.git
conda create -n PiDiViT python=3.9
conda activate PiDiViT
pip install -r PiDiViT/requirements.txt
pip install -e ./PiDiViT

🔍Training

vit=l task=ovd dataset=coco bash scripts/train.sh  # train open-vocabulary COCO with ViT-L

# task=ovd / fsod / osod
# dataset=coco /  voc
# vit= l 
# split = 1 / 2 / 3 / 4 for coco one shot, and 1 / 2 / 3 for voc few-shot. 

# few-shot env var `shot = 5 / 10 / 30`
vit=l task=fsod shot=10 bash scripts/train.sh 

# one-shot env var `split = 1 / 2 / 3 / 4`
vit=l task=osod split=1 bash script/train.sh

# detectron2 options can be provided through args, e.g.,
task=ovd dataset=coco bash scripts/train.sh 

# another env var is `num_gpus = 1 / 2 ...`, used to control
# how many gpus are used

🔍Evaluation

vit=l task=ovd dataset=coco bash scripts/eval.sh # evaluate COCO OVD with ViT-L/14

# evaluate Pascal VOC split-3 with ViT-L/14 with 5 shot
vit=l task=fsod dataset=voc split=3 shot=5 bash scripts/eval.sh 

Check Tools.md for intructions to build prototype and prepare weights (for your custom datasets).

📜 Citation

@InProceedings{Zhou_2025_ICCV,
    author    = {Zhou, Hongliang and Liu, Yongxiang and Mo, Canyu and Li, Weijie and Peng, Bowen and Liu, Li},
    title     = {When Pixel Difference Patterns Meet ViT: PiDiViT for Few-Shot Object Detection},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {24309-24318}
}

📜 License

This project is under the Apache-2.0 license.

⚙️ Acknowledgement

PiDiViT builds upon the good work of DE-ViT. Special thanks to the DE-ViT team for their exceptional open-source contributions.

⭐ Support the Project

If PiDiViT accelerates your research, please ⭐ the repository and cite it to support future development.

Your stars fuel the next breakthrough in few-shot detection! 🔥🚀