LLMScore
October 25, 2023 ยท View on GitHub
๐ค Demo [Coming Soon] ๐ Paper ๐ฆ Twitter
Overview
The two images are generated using Stable-Diffusion-2 based on the text prompt sampled from the Concept Conjunction dataset. Baseline section shows the scores from the existing model-based evaluation metrics, Human section is the rating score from the human evaluation, LLMScore section is our proposed metric. The right column also shows the rationale generated by LLMScore.
Installation
Please follow install page to set up the environments and models.
Text-to-Image Synthesis Evaluation
Get score with rationale for evaluating the alignment between image and text prompt.
python llm_score.py --image sample/sample.png --text_prompt "a red car and a white sheep"
Try different LLMs by setting LLM_ID as one of ["gpt-4", "gpt-3.5-turbo", "vicuna"]:
python llm_score.py --image sample/sample.png --text_prompt "a red car and a white sheep" --llm_id LLM_ID
Notice that to use Vicuna, follow Part Install and Part Model Weights in FastChat_README to install fastchat and to obtain the Vicuna weights. To enable OpenAI-compastible APIs used in our repo, follow commands from Guideline to launch the controller, model worker and RESTful API server as below:
python3 -m fastchat.serve.controller
python3 -m fastchat.serve.model_worker --model-name 'vicuna-7b-v1.1' --model-path /path/to/vicuna/weights
python3 -m fastchat.serve.openai_api_server --host localhost --port 8000
LLMScore with Rationale
Human Correlation
The rank correlation (Kendall's tau) is aggregated across the compositional prompt dataset (Concept Conjunction, Attribute Binding Contrast) on the left two columns (CompBench) and the general prompt dataset (MSCOCO, DrawBench, PaintSkills) on the right two columns (GeneralBench).
Citation
If you found this repository useful, please consider cite our paper:
@misc{lu2023llmscore,
title={LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation},
author={Yujie Lu and Xianjun Yang and Xiujun Li and Xin Eric Wang and William Yang Wang},
year={2023},
eprint={2305.11116},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Acknowledgement
This repo benefits from BLIP-2, GRIT, GPT-4. Thank for their awesome works!