Evaluation

March 18, 2024 · View on GitHub

ViP-Bench

Extract contents of ViP-Bench to ./playground/data/eval/ViP-Bench.
Single-GPU inference and evaluate for bbox and human drawn visual prompts, respectively.

CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vipbench.sh bbox
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vipbench.sh human

Optionally, Change the model name from vip-llava-7b to other LLaVA or ViP-LLaVA models.

Submit the results to the evaluation server: ./playground/data/eval/ViP-Bench/results/vip-llava-7b-human.json.

Optionally, see here, which is an evaluation script using your own openai key.

Source annotation

In source_image, we provide the source plain images along with the bounding box/mask annotations. Researchers can use such grounding information to match the special tokens such as <obj> in "question" entry of vip-bench-meta-data.json. For example, <obj> can be replaced by textual coordinates to evaluate the region-level multimodal models.

Academic Benchmarks

Please download the evaluation json dataset here.

Visusl7W

CUDA_VISIBLE_DEVICES=0 bash scripts/eval/v7w.sh

PointQA-LookTwice

CUDA_VISIBLE_DEVICES=0 bash scripts/eval/pointQA.sh

Visual Commonsense Reasoning

For Q -> A:

CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vcr-qa.sh

For QA -> R:

CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vcr-qar.sh