NADA
December 10, 2024 ยท View on GitHub
Official code for No Annotations for Object Detection in Art through Stable Diffusion (WACV 2025)
[๐ Paper] [๐ฅ๏ธ Project Page]
Setup
This repository is composed of three folders corresponding to different parts of training or evaluting NADA. The code is organized this way to prevent conflicting dependicies.
-
prompt-to-promptThis folder contains code for the class proposers not based on LLaVA and the class-conditioned detector. This uses code from Google's prompt-to-prompt repository and DAAM.
Create a Python virtual environment and pip install the corresponding requirements file to set up the folder.
-
For the class-conditioned detector and weakly-supervised class proposer
cd prompt-to-prompt python -m venv env source env/bin/activate pip install -r requirements.txt -
For the non-LLaVA zero-shot class proposers
cd prompt-to-prompt python -m venv cp_env source cp_env/bin/activate pip install -r cp_requirements.txt
-
-
detectron2Code for evaluating predictions made by NADA. Bounding boxes are saved in the COCO format, so we use Meta's Detectron2 library to evaluate them.
Create a virtual environment and pip install from
requirements.txtto set it up.cd detectron2 python -m venv env source env/bin/activate pip install -r requirements.txt -
LLaVACode for generating outputs with LLaVA. We use LLaVA for our zero-shot class proposer and for caption prompt construction. This uses code from the official LLaVA repository.
Create a Python environment and install from folder to set it up.
cd LLaVA python -m venv env source env/bin/activate pip install -e .
Preparing data
Download ArtDL and IconArt and place the ArtDL and IconArt_v1 folders in a data folder at the root of the repository.
Using NADA
Using the class proposer
Weakly-supervised class proposer
Run prompt-to-prompt/classify/fc.py to train and perform inference (to create labels for use with the class-conditioned detector) with the weakly-supervised class proposer.
cd prompt-to-prompt
python classify/fc.py \
--dataset {artdl, iconart} \
--classification-type {single, multi} \
--data-type images \
--modes {train, eval, label} \
--num-layers {2, 3} \
--checkpoint checkpoints/{artdl, iconart}/checkpoint.ckpt \
--save-dir labels/{ex. artdl_wscp}
Specify --eval-label-split {} when eval or label (inference) is includes in --modes. Refer to prompt-to-prompt/data/classify_with_labels.py for the splits per dataset. Items in {} are options/examples.
Zero-shot class proposer
Run LLaVA/classify.py to train the zero-shot class proposer.
cd LLaVA
python classify.py \
--dataset {artdl, iconart} \
--prompt {who, score}
--dataset-split {}
--save-dir ../prompt-to-prompt/labels/{ex. artdl_zscp}
Use --prompt who (the choice prompt in the paper) for artdl and --prompt score (the score prompt in the paper) for iconart.
Using the class-conditioned detector
The class-conditioned detector uses the labels inferred by the class proposer to perform detection requires no training. The detector relies on a text prompt, and we support two kinds of prompt construction.
Template prompt construction
Template prompt construction inserts the labels into templates ร la CLIP. Run prompt-to-prompt/generate.py:
cd prompt-to-prompt
python generate.py \
--dataset {artdl, iconart} \
--dataset-split {} \
--prompt-type {} \
--save-dir annotations/{ex. artdl_wscp} \
--label-dir labels/{ex. artdl_wscp}
In the paper, we use --prompt-type wikipedia for artdl and --prompt-type custom_1 for iconart.
Caption prompt construction
Caption prompt construction uses a caption containing the label as a prompt. First, create captions using LLaVA/caption.py:
cd LLaVA
python caption.py \
--dataset {artdl, iconart \
--dataset-split {} \
--prompt-type \
--label-dir {ex. ../prompt-to-prompt/labels/artdl_wscp} \
--save-dir {ex. ../prompt-prompt/captions/artdl_wscp}
Then run LLaVA/check_captions.py to check if the captions contain the labels at indices within the maximum input length of the diffusion model, and modify them if necessary.
--dataset {artdl, iconart \
--dataset-split {} \
--prompt-type \
--save-dir {ex. ../prompt-prompt/captions/artdl_wscp}
Once the captions are ready, use prompt-to-prompt/generate.py like in template prompt construction, but instead of --label-dir, use --caption-dir.
Evaluation
Use the nada_eval.ipynb notebook in LLaVA.
Citation
@InProceedings{Ramos_2025_WACV,
author = {Ramos, Patrick and Gonthier, Nicolas and Khan, Selina and Nakashima, Yuta and Garcia, Noa},
title = {No Annotations for Object Detection in Art through Stable Diffusion},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {February},
year = {2025}
}