NADA

December 10, 2024 · View on GitHub

Official code for No Annotations for Object Detection in Art through Stable Diffusion (WACV 2025)

Setup

This repository is composed of three folders corresponding to different parts of training or evaluting NADA. The code is organized this way to prevent conflicting dependicies.

prompt-to-prompt

This folder contains code for the class proposers not based on LLaVA and the class-conditioned detector. This uses code from Google's prompt-to-prompt repository and DAAM.

Create a Python virtual environment and pip install the corresponding requirements file to set up the folder.
- For the class-conditioned detector and weakly-supervised class proposer
```
cd prompt-to-prompt
python -m venv env
source env/bin/activate
pip install -r requirements.txt
```
- For the non-LLaVA zero-shot class proposers
```
cd prompt-to-prompt
python -m venv cp_env
source cp_env/bin/activate
pip install -r cp_requirements.txt
```
detectron2

Code for evaluating predictions made by NADA. Bounding boxes are saved in the COCO format, so we use Meta's Detectron2 library to evaluate them.

Create a virtual environment and pip install from requirements.txt to set it up.
```
cd detectron2
python -m venv env
source env/bin/activate
pip install -r requirements.txt
```
LLaVA

Code for generating outputs with LLaVA. We use LLaVA for our zero-shot class proposer and for caption prompt construction. This uses code from the official LLaVA repository.

Create a Python environment and install from folder to set it up.
```
cd LLaVA
python -m venv env
source env/bin/activate
pip install -e .
```

Preparing data

Download ArtDL and IconArt and place the ArtDL and IconArt_v1 folders in a data folder at the root of the repository.

Using NADA

Using the class proposer

Weakly-supervised class proposer

Run prompt-to-prompt/classify/fc.py to train and perform inference (to create labels for use with the class-conditioned detector) with the weakly-supervised class proposer.

cd prompt-to-prompt
python classify/fc.py \
--dataset {artdl, iconart} \
--classification-type {single, multi} \
--data-type images \
--modes {train, eval, label} \
--num-layers {2, 3} \
--checkpoint checkpoints/{artdl, iconart}/checkpoint.ckpt \
--save-dir labels/{ex. artdl_wscp}

Specify --eval-label-split {} when eval or label (inference) is includes in --modes. Refer to prompt-to-prompt/data/classify_with_labels.py for the splits per dataset. Items in {} are options/examples.

Zero-shot class proposer

Run LLaVA/classify.py to train the zero-shot class proposer.

cd LLaVA
python classify.py \
--dataset {artdl, iconart} \
--prompt {who, score}
--dataset-split {}
--save-dir ../prompt-to-prompt/labels/{ex. artdl_zscp}

Use --prompt who (the choice prompt in the paper) for artdl and --prompt score (the score prompt in the paper) for iconart.

Using the class-conditioned detector

The class-conditioned detector uses the labels inferred by the class proposer to perform detection requires no training. The detector relies on a text prompt, and we support two kinds of prompt construction.

Template prompt construction

Template prompt construction inserts the labels into templates à la CLIP. Run prompt-to-prompt/generate.py:

cd prompt-to-prompt
python generate.py \
--dataset {artdl, iconart} \
--dataset-split {} \
--prompt-type {} \
--save-dir annotations/{ex. artdl_wscp} \
--label-dir labels/{ex. artdl_wscp}

In the paper, we use --prompt-type wikipedia for artdl and --prompt-type custom_1 for iconart.

Caption prompt construction

Caption prompt construction uses a caption containing the label as a prompt. First, create captions using LLaVA/caption.py:

cd LLaVA
python caption.py \
--dataset {artdl, iconart \
--dataset-split {} \
--prompt-type \
--label-dir {ex. ../prompt-to-prompt/labels/artdl_wscp} \
--save-dir {ex. ../prompt-prompt/captions/artdl_wscp}

Then run LLaVA/check_captions.py to check if the captions contain the labels at indices within the maximum input length of the diffusion model, and modify them if necessary.

--dataset {artdl, iconart \
--dataset-split {} \
--prompt-type \
--save-dir {ex. ../prompt-prompt/captions/artdl_wscp}

Once the captions are ready, use prompt-to-prompt/generate.py like in template prompt construction, but instead of --label-dir, use --caption-dir.

Evaluation

Use the nada_eval.ipynb notebook in LLaVA.

Citation

@InProceedings{Ramos_2025_WACV,
    author    = {Ramos, Patrick and Gonthier, Nicolas and Khan, Selina and Nakashima, Yuta and Garcia, Noa},
    title     = {No Annotations for Object Detection in Art through Stable Diffusion},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {February},
    year      = {2025}
}