Evaluation
March 18, 2024 ยท View on GitHub
ViP-Bench
- Extract contents of
ViP-Benchto./playground/data/eval/ViP-Bench. - Single-GPU inference and evaluate for bbox and human drawn visual prompts, respectively.
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vipbench.sh bbox
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vipbench.sh human
Optionally, Change the model name from vip-llava-7b to other LLaVA or ViP-LLaVA models.
- Submit the results to the evaluation server:
./playground/data/eval/ViP-Bench/results/vip-llava-7b-human.json.
Optionally, see here, which is an evaluation script using your own openai key.
Source annotation
In source_image, we provide the source plain images along with the bounding box/mask annotations. Researchers can use such grounding information to match the special tokens such as <obj> in "question" entry of vip-bench-meta-data.json. For example, <obj> can be replaced by textual coordinates to evaluate the region-level multimodal models.
Academic Benchmarks
Please download the evaluation json dataset here.
Visusl7W
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/v7w.sh
PointQA-LookTwice
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/pointQA.sh
Visual Commonsense Reasoning
For Q -> A:
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vcr-qa.sh
For QA -> R:
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vcr-qar.sh