README.md
October 6, 2025 · View on GitHub
This repository provides the official implementation of DitHub, accepted at NeurIPS 2025.
DitHub is a modular and scalable framework designed for Incremental Open-Vocabulary Object Detection. Inspired by version control systems, DitHub enables the seamless management, update, and composition of class-specific adaptation modules.
⚙️ Setup
Prerequisites
- Python >= 3.9
- CUDA >= 11.8, GCC >= 11.4
Installation
- Clone the DitHub repository from GitHub and change the current directory
git clone https://github.com/chiara-cap/DitHub.git
cd DitHub
- Create and activate a conda environment
conda create -n dithub python=3.9 pip
conda activate dithub
- Install the requirements
pip install -r requirements.txt
- Clone the Deformable-DETR repository and compile CUDA operators
git clone https://github.com/fundamentalvision/Deformable-DETR.git
cd Deformable-DETR/models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py
- Download the pre-trained GroundingDINO model weights to be used as the starting point for training.
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
📁 Datasets
To reproduce the experiments in the paper, download the following datasets and place them inside the datasets/ folder:
- COCO — used for zero-shot evaluation
- ODinW-35 — required because it includes the 13 subsets used in the paper (ODinW-13) for incremental training
To download ODinW-35, run the following script:
mkdir datasets
cd datasets
python tools/download_odinw.py
ℹ️ Note: ODinW-13 is automatically configured when downloading ODinW-35.
To test on the ODinW-O benchmark (for overlapping classes), after downloading ODinW-35, run:
python tools/overlapped_classes_dataset.py
🚀 Running Dithub
Training
To start the training process, run the following command. This will use the specified configuration, load the pre-trained checkpoint (if available), and save the outputs in the designated directory:
sh train_dithub.sh
Evalutation
Before running evaluation, download the pre-trained DitHub model weights:
wget -q https://github.com/chiara-cap/DitHub/releases/download/v.0.1/model_final.pth
Then, to evaluate the model, run:
sh eval_dithub.sh
Command Line Arguments
| Parameter | Description |
|---|---|
--config-file | Path to the training configuration file. |
--model-config-file | Path to the model architecture configuration file. |
--model-checkpoint-path | Path to the pre-trained model checkpoint file (Grounding DINO or DitHub). |
--output-dir | Directory where outputs (logs, checkpoints) will be saved. |
--dithub | Flag to integrate Dithub into Grounding DINO. |
--eval-only | Run the model in evaluation mode without training. |
📜 Citation
If you find DitHub useful for your research, please consider citing our paper:
@article{cappellino2025dithub,
title={DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection},
author={Cappellino, Chiara and Mancusi, Gianluca and Mosconi, Matteo and Porrello, Angelo and Calderara, Simone and Cucchiara, Rita},
journal={Advances in Neural Information Processing Systems},
year={2025}
}