README.md

December 9, 2025 · View on GitHub

RefDrone: A Challenging Benchmark for Drone Scene Referring Expression Comprehension

TODO list

  • ✅ Release RefDrone test dataset
  • ✅ Release RefDrone train/val dataset
  • ✅ Release RDAnnotator
  • ✅ Release NGDINO

RefDrone Dataset

Please download RefDrone dataset from Huggingface.

RDAnnotator

The code of RDAnnotator is available on GitHub.

Installation

The recommended configuration is 4 A100 GPUs, with CUDA version 12.1. The other configurations in MMDetection should also work.

Please follow the guide to install and set up of the mmdetection.

conda create --name openmmlab python=3.10.6 -y
conda activate openmmlab

pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121

pip install -U openmim
mim install mmengine
mim install "mmcv==2.2.0"
git@github.com:sunzc-sunny/refdrone.git
cd refdrone
pip install -v -e .

Preparation

After downloading and unzipping the images and annotations, place the dataset (or create a symbolic link to it) inside the datasets/ directory. For convenience, it's recommended to store all images together in an all_images/ subdirectory. Your directory structure should look like this:

refdrone
├── configs
├── datasets
│   ├── VisDrone2019
│   │   ├── RefDrone_train_mdetr.json
│   │   ├── RefDrone_test_mdetr.json
│   │   ├── RefDrone_val_mdetr.json
│   │   ├── all_image
│   │   │   ├── xxx.jpg
│   │   │   ├── ...

Then use refcoco2odvg.py to convert RefDrone_train_mdetr.json into the ODVG format required for training:

python tools/dataset_converters/refcoco2odvg.py datasets/VisDrone2019

Usage

Visualize

Visualize the dataset with visualization saved to work_dirs/vis_results/.

python visualize.py 

Train

# single gpu
python tools/train.py configs/NGDINO/ngdino_swin-t_refdrone.py # 5 epoch

python tools/train.py configs/NGDINO/ngdino_swin-t_refdrone_e50.py   # 50 epoch

# multi gpu
bash tools/dist_train.sh configs/NGDINO/ngdino_swin-t_refdrone.py   NUM_GPUs      # 5 epoch

bash tools/dist_train.sh configs/NGDINO/ngdino_swin-t_refdrone_e50.py   NUM_GPUs  # 50 epoch

Inference

Download the checkpoint from Huggingface.

# single gpu 
python tools/test.py configs/NGDINO/ngdino_swin-t_refdrone.py  CHECKPOINT

# multi gpu
bash tools/dist_test.sh configs/NGDINO/ngdino_swin-t_refdrone.py  CHECKPOINT NUM_GPUs

Single image inference

python demo/image_demo.py \
    all_image/0000189_00297_d_0000198.jpg \
    configs/NGDINO/ngdino_swin-t_refdrone.py \
    --weights NGDINO_T.pth \
    --texts 'The white vans parked on the left side of the road.'
    --tokens-positive -1

Parameter Description

  • First parameter: Input image path
  • Second parameter: Configuration file path
  • --weights: Pre-trained model weight file
  • --texts: Text description for detection

The inference results (including visualization images and detection results) will be saved in the outputs/ directory.

License

This project is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.