README.md

February 1, 2024 · View on GitHub

FGVP: Fine-Grained Visual Prompting

Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023

Install

Our code is built upon ReClip. The installation instructions and the preparation of datasets are the same as the ReClip repository.

FGVP

A Summary of visual prompts with the caption "elephant on the left".

Results

MethodVLMVisual PromptPost ProcessingCommandRefCOCO valRefCOCO+ valRefCOCOg val
CPT-adaptedViT-B/32, RN50x16B2B2Rlink41.341.351.3
ReCLIPViT-B/32, RN50x16P  B4P{\ \| \ }B4Rlink45.847.959.3
RedCircleViT-B/32, RN50x16P  C1P{\ \| \ }C1Rlink43.945.357.3
FGVP (ours)ViT-B/32, RN50x16P  D4P{\ \| \ }D4Rlink52.053.362.1
RedCircle (reported in paper)ViT-L/14@336px, RN50x16C1  C3  C4C1{\ \| \ }C3{\ \| \ }C4S--49.855.359.4
RedCircleViT-L/14@336px, RN50x16C1  C3  C4C1{\ \| \ }C3{\ \| \ }C4Slink51.456.358.3
FGVP (ours)ViT-L/14@336px, RN50x16D1  D3  D4D1{\ \| \ }D3{\ \| \ }D4Slink52.957.458.1
RedCircleViT-L/14@336px, RN50x16P  C1  C3  C4P{\ \| \ }C1{\ \| \ }C3{\ \| \ }C4Slink51.658.160.0
FGVP (ours)ViT-L/14@336px, RN50x16P  D1  D3  D4P{\ \| \ }D1{\ \| \ }D3{\ \| \ }D4Slink53.959.361.0
RedCircleViT-L/14@336px, RN50x16P  C1  C3  C4P{\ \| \ }C1{\ \| \ }C3{\ \| \ }C4RSlink56.858.662.2
FGVP (ours)ViT-L/14@336px, RN50x16P  D1  D3  D4P{\ \| \ }D1{\ \| \ }D3{\ \| \ }D4RSlink59.660.063.3

Inference Single Image

We simply offer an inference script for a single image without post-processing.

# example 1
python fgvp-reclip/simple_inference.py \
    --img_dir demo/exp1/ori.png \
    --text 'apple on the left' 'apple in the middle' 'broccoli' 'raspberries' 'grossum' 'glass bowl' \
    --out_dir demo/exp1 \
    --sam_prompt grid

# example 2
python fgvp-reclip/simple_inference.py \
    --img_dir demo/exp2/ori.png \
    --out_dir demo/exp2 \
    --text 'photo on the wall' \
    --sam_prompt grid

You can provide proposal boxes derived from other detectors to achieve better localization. Save your bounding boxes in a JSON file and specify it with --candidate_boxes.

# example
python fgvp-reclip/simple_inference.py \
    --img_dir demo/exp1/ori.png \
    --text 'apple on the left' 'apple in the middle' 'broccoli' 'raspberries' 'grossum' 'glass bowl' \
    --out_dir demo/exp1 \
    --sam_prompt box \
    --candidate_boxes demo/exp1/candidate_boxes.json