EVAL.md

February 15, 2026 · View on GitHub

Evaluation

For now we only provide inference code, please turn to the official repos of the benchmarks to calculate final performance. the /path/to/your/ckpt can be both .pth folder or .pt file

GenEval

See GenEval for original GenEval prompts and put in evaluation/geneval

export PYTHONPATH=.
accelerate launch scripts/evaluation/gen_eval.py  \ 
         --checkpoint /path/to/your/ckpt   \ 
         --output /path/to/save/results \
         --data evaluation/geneval/geneval_prompt.jsonl \
         --height 512 \
         --width 512 \
         --seed 42

DPGBench

See DPGBench for original DPGBench prompts and put in evaluation/DPGBench

export PYTHONPATH=.
accelerate launch scripts/evaluation/dpg_bench.py  \ 
         --checkpoint /path/to/your/ckpt   \ 
         --output /path/to/save/results \
         --data evaluation/DPG-Bench/prompts \
         --height 512 \
         --width 512 \
         --seed 42

UniGenBench

Please download the benchmark data from UniGenBench and place it in the evaluation/UniGenBench.

export PYTHONPATH=.
accelerate launch scripts/evaluation/unigenbench.py  \ 
         --checkpoint /path/to/your/ckpt   \ 
         --output /path/to/save/results \
         --data evaluation/UniGenBench/test_prompts_en.csv \
         --height 512 \
         --width 512 \
         --seed 42

WISE

Please download the benchmark data from WISE and place it in the evaluation/WISE.

export PYTHONPATH=.
accelerate launch scripts/evaluation/wise.py  \ 
         --checkpoint /path/to/your/ckpt   \ 
         --output /path/to/save/results \
         --data evaluation/wise/data/spatio-temporal_reasoning.json \ # for spatio-temporal domain
         --height 512 \
         --width 512 \
         --seed 42

T2I-CoREBench

Please download the benchmark data from T2I-CoreBench and place it in the evaluation/T2I-CoReBench-main.

export PYTHONPATH=.
accelerate launch scripts/evaluation/corebench.py  \ 
         --checkpoint /path/to/your/ckpt   \ 
         --output /path/to/save/results \
         --data evaluation/T2I-CoReBench-main/corebench.jsonl \
         --height 512 \
         --width 512 \
         --seed 42

ImgEdit

Please download the benchmark data from ImgEdit-Bench and place it in the evaluation/ImgEdit.

export PYTHONPATH=.
accelerate launch scripts/evaluation/img_edit.py  \ 
         --checkpoint /path/to/your/ckpt   \ 
         --output /path/to/save/results \
         --data evaluation/ImgEdit/Benchmark/singleturn \ 
         --height 512 \
         --width 512 \
         --seed 42

GEdit

Please download the benchmark data from GEdit-Bench and place it in the evaluation/GEdit-Bench.

export PYTHONPATH=.
accelerate launch scripts/evaluation/gedit.py  \ 
         --checkpoint /path/to/your/ckpt   \ 
         --output /path/to/save/results \
         --data evaluation/GEdit-Bench \ 
         --height 512 \
         --width 512 \
         --seed 42

RISE

Please download the benchmark data from RISE and place it in the evaluation/RISEBench-full.

export PYTHONPATH=.
accelerate launch scripts/evaluation/rise_bench.py  \ 
         --checkpoint /path/to/your/ckpt   \ 
         --output /path/to/save/results \
         --data evaluation/RISEBench-full \ 
         --height 512 \
         --width 512 \
         --seed 42

UniREditBench

Please download the benchmark data from UniREditBench and place it in the evaluation/UniREditBench.

export PYTHONPATH=.
accelerate launch scripts/evaluation/unireditbench.py  \ 
         --checkpoint /path/to/your/ckpt   \ 
         --output /path/to/save/results \
         --data evaluation/UniREditBench \ 
         --height 512 \
         --width 512 \
         --seed 42

CVTG

Please download the benchmark data from CVTG and place it in the evaluation/CVTG-2K.

export PYTHONPATH=.
accelerate launch scripts/evaluation/CVTG.py  \ 
         --checkpoint /path/to/your/ckpt   \ 
         --output /path/to/save/results \
         --data evaluation/CVTG-2K \ 
         --height 512 \
         --width 512 \
         --seed 42