create conda env for ov-dino

February 6, 2025 ยท View on GitHub

๐Ÿฆ– OV-DINO

Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

Hao Wang1,2,Pengzhen Ren1,Zequn Jie3, Xiao Dong1, Chengjian Feng3, Yinlong Qian3,

Lin Ma3, Dongmei Jiang2, Yaowei Wang2,4, Xiangyuan Lan2:email:, Xiaodan Liang1,2:email:

1 Sun Yat-sen University, 2 Pengcheng Lab, 3 Meituan Inc, 4 HIT, Shenzhen

:email: corresponding author.

[Paper] [HuggingFace] [Demo] [BibTex]

PWC PWC PWC

:fire: Updates

  • 15/08/2024: :sparkles: Have a look!!! We release the pre-training code and log on O365 dataset. You could try to reproduce our results.

  • 06/08/2024: :sparkler: Awesome!!! OV-SAM = OV-DINO + SAM2. We update OV-SAM marrying OV-DINO with SAM2 on the online demo.

  • 16/07/2024: We provide the online demo, click and enjoy !!! NOTE: You uploaded image will be stored for failure analysis.

  • 16/07/2024: We release the web inference demo, try to deploy it by yourself.

  • 15/07/2024: We release the fine-tuning code, try to fine-tune on your custom dataset. Feel free to raise issue if you encounter some problem.

  • 15/07/2024: We release the local inference demo, try to deploy OV-DINO on you local machine and run inference on images.

  • 14/07/2024: We release the pre-trained models and the evaluation code.

  • 11/07/2024: We release OV-DINO paper on arxiv. Code and pre-trained model are coming soon.

:rocket: Introduction

This project contains the official PyTorch implementation, pre-trained models, fine-tuning code, and inference demo for OV-DINO.

  • OV-DINO is a novel unified open vocabulary detection approach that offers superior performance and effectiveness for practical real-world application.

  • OV-DINO entails a Unified Data Integration pipeline that integrates diverse data sources for end-to-end pre-training, and a Language-Aware Selective Fusion module to improve the vision-language understanding of the model.

  • OV-DINO shows significant performance improvement on COCO and LVIS benchmarks compared to previous methods, achieving relative improvements of +2.5% AP on COCO and +12.7% AP on LVIS compared to G-DINO in zero-shot evaluation.

:page_facing_up: Overview

:sparkles: Model Zoo

ModelPre-Train DataAPmvAPrAPcAPfAPvalAPrAPcAPfAPcocoWeights
OV-DINO1O36524.415.520.329.718.79.314.527.449.5 / 57.5CKPT / LOG ๐Ÿค—
OV-DINO2O365,GoldG39.432.038.741.332.226.230.137.350.6 / 58.4CKPT ๐Ÿค—
OV-DINO3O365,GoldG,CC1Mโ€ก40.134.539.541.532.929.130.437.450.2 / 58.2CKPT ๐Ÿค—

NOTE: APmv denotes the zero-shot evaluation results on LVIS MiniVal, APval denotes the zero-shot evaluation results on LVIS Val, APcoco denotes (zero-shot / fine-tune) evaluation results on COCO, respectively.

:checkered_flag: Getting Started

1. Project Structure

OV-DINO
โ”œโ”€โ”€ datas
โ”‚ย ย  โ”œโ”€โ”€ o365
โ”‚   โ”‚   โ”œโ”€โ”€ annotations
โ”‚   โ”‚   โ”œโ”€โ”€ train
โ”‚   โ”‚   โ”œโ”€โ”€ val
โ”‚   โ”‚   โ””โ”€โ”€ test
โ”‚ย ย  โ”œโ”€โ”€ coco
โ”‚   โ”‚   โ”œโ”€โ”€ annotations
โ”‚   โ”‚   โ”œโ”€โ”€ train2017
โ”‚   โ”‚   โ””โ”€โ”€ val2017
โ”‚   โ”œโ”€โ”€ lvis
โ”‚   โ”‚   โ”œโ”€โ”€ annotations
โ”‚   โ”‚   โ”œโ”€โ”€ train2017
โ”‚   โ”‚   โ””โ”€โ”€ val2017
โ”‚   โ””โ”€โ”€ custom
โ”‚       โ”œโ”€โ”€ annotations
โ”‚       โ”œโ”€โ”€ train
โ”‚       โ””โ”€โ”€ val
โ”œโ”€โ”€ docs
โ”œโ”€โ”€ inits
โ”‚ย ย  โ”œโ”€โ”€ huggingface
โ”‚ย ย  โ”œโ”€โ”€ ovdino
โ”‚ย ย  โ”œโ”€โ”€ sam2
โ”‚ย ย  โ””โ”€โ”€ swin
โ”œโ”€โ”€ ovdino
โ”‚ย ย  โ”œโ”€โ”€ configs
โ”‚ย ย  โ”œโ”€โ”€ demo
โ”‚ย ย  โ”œโ”€โ”€ detectron2-717ab9
โ”‚ย ย  โ”œโ”€โ”€ detrex
โ”‚ย ย  โ”œโ”€โ”€ projects
โ”‚ย ย  โ”œโ”€โ”€ scripts
โ”‚ย ย  โ””โ”€โ”€ tools
โ”œโ”€โ”€ wkdrs
โ”‚   โ”œโ”€โ”€ ...
โ”‚

2. Installation

# clone this project
git clone https://github.com/wanghao9610/OV-DINO.git
cd OV-DINO
export root_dir=$(realpath ./)
cd $root_dir/ovdino

# Optional: set CUDA_HOME for cuda11.6.
# OV-DINO utilizes the cuda11.6 default, if your cuda is not cuda11.6, you need first export CUDA_HOME env manually.
export CUDA_HOME="your_cuda11.6_path"
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
echo -e "$log_format cuda version:\n$(nvcc -V)"

# create conda env for ov-dino
conda create -n ovdino -y
conda activate ovdino
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia -y
conda install gcc=9 gxx=9 -c conda-forge -y # Optional: install gcc9
python -m pip install -e detectron2-717ab9
pip install -e ./

# Optional: create conda env for ov-sam, it may not compatible with ov-dino, so we create a new env.
# ov-sam = ov-dino + sam2
conda create -n ovsam -y
conda activate ovsam
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia -y
# install the sam2 following the sam2 project.
# please refer to https://github.com/facebookresearch/segment-anything-2.git
# download sam2 checkpoints and put them to inits/sam2
python -m pip install -e detectron2-717ab9
pip install -e ./

2. Data Preparing

COCO

  • Download COCO from the official website, and put them on datas/coco folder.
    cd $root_dir
    mkdir -p datas/coco
    wget http://images.cocodataset.org/zips/train2017.zip -O datas/coco/train2017.zip
    wget http://images.cocodataset.org/zips/val2017.zip -O datas/coco/val2017.zip
    wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip -O datas/coco/annotations_trainval2017.zip
    
  • Extract the ziped files, and remove them:
    cd $root_dir
    unzip datas/coco/train2017.zip -d datas/coco
    unzip datas/coco/val2017.zip -d datas/coco
    unzip datas/coco/annotations_trainval2017.zip -d datas/coco
    rm datas/coco/train2017.zip datas/coco/val2017.zip datas/coco/annotations_trainval2017.zip
    

LVIS

  • Download LVIS annotation files:
    cd $root_dir
    mkdir -p datas/lvis
    wget https://huggingface.co/hao9610/OV-DINO/resolve/main/lvis_v1_minival_inserted_image_name.json -O datas/lvis/annotations/lvis_v1_minival_inserted_image_name.json
    wget https://huggingface.co/hao9610/OV-DINO/resolve/main/lvis_v1_val_inserted_image_name.json -O datas/lvis/annotations/lvis_v1_val_inserted_image_name.json
    
  • Soft-link COCO to LVIS:
    cd $root_dir
    ln -s $(realpath datas/coco/train2017) datas/lvis
    ln -s $(realpath datas/coco/val2017) datas/lvis
    

Objects365

  • Refer to the OpenDataLab for Objects365V1 download, which has provided detailed download instruction.
    cd $root_dir
    mkdir -p datas/o365/annotations
    # Suppose you download the Objects365 raw file and put them on datas/o365/raw, extract the tared files and reorder them.
    cd datas/o365/raw
    tar -xvf Objects365_v1.tar.gz
    cd 2019-08-02
    for file in *.zip; do unzip -o "$file"; done
    mv *.json $root_dir/datas/o365/annotations
    mv train val test $root_dir/datas/o365
    

3. Evaluation

Download the pre-trained model from Model Zoo, and put them on inits/ovdino directory.

cd $root_dir/ovdino
bash scripts/eval.sh path_to_eval_config_file path_to_pretrained_model output_directory

Zero-Shot Evaluation on COCO Benchmark

cd $root_dir/ovdino
# Evaluation mean ap on COCO dataset.
bash scripts/eval.sh \
  projects/ovdino/configs/ovdino_swin_tiny224_bert_base_eval_coco.py \
  ../inits/ovdino/ovdino_swint_og-coco50.6_lvismv39.4_lvis32.2.pth \
  ../wkdrs/eval_ovdino

Zero-Shot Evaluation on LVIS Benchmark

cd $root_dir/ovdino
# Evaluation of fixed_ap on LVIS MiniVal dataset.
bash scripts/eval.sh \
  projects/ovdino/configs/ovdino_swin_tiny224_bert_base_eval_lvismv.py \
  ../inits/ovdino/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth \
  ../wkdrs/eval_ovdino

# Evaluation of fixed_ap on the LVIS Val dataset. 
# It will require about 250GB of memory due to the large number of samples in the LVIS Val dataset, so please ensure that your machine has enough memory.
bash scripts/eval.sh \
  projects/ovdino/configs/ovdino_swin_tiny224_bert_base_eval_lvis.py \
  ../inits/ovdino/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth \
  ../wkdrs/eval_ovdino

4. Fine-Tuning

Fine-Tuning on COCO Dataset

cd $root_dir/ovdino
bash scripts/finetune.sh \
  projects/ovdino/configs/ovdino_swin_tiny224_bert_base_ft_coco_24ep.py \
  ../inits/ovdino/ovdino_swint_og-coco50.6_lvismv39.4_lvis32.2.pth

Fine-Tuning on Custom Dataset

  • Prepare your custom dataset as the COCO annotation format, following the instructions on custom_ovd.py.

  • Refer the following command to run fine-tuning.

    cd $root_dir/ovdino
    bash scripts/finetune.sh \
      projects/ovdino/configs/ovdino_swin_tiny224_bert_base_ft_custom_24ep.py \
      ../inits/ovdino/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth
    

5. Pre-Training

Pre-Training on Objects365 dataset

  • Download dataset following Objects365 Data Preparing.

  • Refer the following command to run pre-training.

    On the first machine:

    cd $root_dir/ovdino
    # Replace $MASTER_PORT and $MASTER_ADDR with your actual machine settings.
    NNODES=2 NODE_RANK=0 MASTER_PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR \
    bash scripts/pretrain.sh \
      projects/ovdino/configs/ovdino_swin_tiny224_bert_base_pretrain_o365_24ep.py
    

    On the second machine:

    cd $root_dir/ovdino
    # Replace $MASTER_PORT and $MASTER_ADDR with your actual machine settings.
    NNODES=2 NODE_RANK=1 MASTER_PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR \
    bash scripts/pretrain.sh \
      projects/ovdino/configs/ovdino_swin_tiny224_bert_base_pretrain_o365_24ep.py
    

    NOTE: The default batch size for O365 pre-training is 64 in our experiments, and running on 2 nodes with 8 A100 GPUs per-node. If you encounter Out-of-Memory error, you can adjust the batch size and learning rate, total steps by linearly.

Pre-Training on [Objects365, GoldG] datasets

Coming soon ...

Pre-Training on [Objects365, GoldG, CC1Mโ€ก] datasets

Coming soon ...

NOTE: We will release the all pre-training code after our paper is accepted.

:computer: Demo

  • Local inference on a image or folder give the category names.

    # for ovdino: conda activate ovdino
    # for ovsam: conda activate ovsam
    cd $root_dir/ovdino
    bash scripts/demo.sh demo_config.py pretrained_model category_names input_images_or_directory output_directory
    

    Examples:

    cd $root_dir/ovdino
    # single image inference
    bash scripts/demo.sh \
      projects/ovdino/configs/ovdino_swin_tiny224_bert_base_infer_demo.py \
      ../inits/ovdino/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth \
      "class0 class1 class2 ..." img0.jpg output_dir/img0_vis.jpg
    
    # multi images inference
    bash scripts/demo.sh \
      projects/ovdino/configs/ovdino_swin_tiny224_bert_base_infer_demo.py \
      ../inits/ovdino/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth \
      "class0 long_class1 long_class2 ..." "img0.jpg img1.jpg" output_dir
    
    # image folder inference
    bash scripts/demo.sh \
      projects/ovdino/configs/ovdino_swin_tiny224_bert_base_infer_demo.py \
      ../inits/ovdino/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth \
      "class0 long_class1 long_class2 ..." image_dir output_dir
    

    NOTE: The input category_names are separated by spaces, and the words of single class are connected by underline (_).

  • Web inference demo.

    cd $root_dir/ovdino
    bash scripts/app.sh \
      projects/ovdino/configs/ovdino_swin_tiny224_bert_base_infer_demo.py \
      ../inits/ovdino/ovdino_swint_ogc-coco50.2_lvismv40.1_lvis32.9.pth
    

    After the web demo deployment, you can open the demo on your browser.

    We also provide the online demo, click and enjoy.

:white_check_mark: TODO

  • Release the pre-trained model.
  • Release the fine-tuning and evaluation code.
  • Support the local inference demo.
  • Support the web inference demo.
  • Release the pre-training code on O365 dataset.
  • Support ONNX exporting OV-DINO model.
  • Support OV-DINO in ๐Ÿค— transformers.
  • Release the all pre-training code.

:blush: Acknowledge

This project has referenced some excellent open-sourced repos (Detectron2, detrex, GLIP, G-DINO, YOLO-World). Thanks for their wonderful works and contributions to the community.

:pushpin: Citation

If you find OV-DINO is helpful for your research or applications, please consider giving us a star ๐ŸŒŸ and citing it by the following BibTex entry.

@article{wang2024ovdino,
  title={OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion},
  author={Wang, Hao and Ren, Pengzhen and Jie, Zequn and Dong, Xiao and Feng, Chengjian and Qian, Yinlong and Ma, Lin and Jiang, Dongmei and Wang, Yaowei and Lan, Xiangyuan and others},
  journal={arXiv preprint arXiv:2407.07844},
  year={2024}
}