EVAL.md
February 15, 2026 ยท View on GitHub
Evaluation
For now we only provide inference code, please turn to the official repos of the benchmarks to calculate final performance.
the /path/to/your/ckpt can be both .pth folder or .pt file
GenEval
- See GenEval for original GenEval prompts and put in evaluation/geneval
export PYTHONPATH=.
accelerate launch scripts/evaluation/gen_eval.py \
--checkpoint /path/to/your/ckpt \
--output /path/to/save/results \
--data evaluation/geneval/geneval_prompt.jsonl \
--height 512 \
--width 512 \
--seed 42
DPGBench
- See DPGBench for original DPGBench prompts and put in evaluation/DPGBench
export PYTHONPATH=.
accelerate launch scripts/evaluation/dpg_bench.py \
--checkpoint /path/to/your/ckpt \
--output /path/to/save/results \
--data evaluation/DPG-Bench/prompts \
--height 512 \
--width 512 \
--seed 42
UniGenBench
- Please download the benchmark data from UniGenBench and place it in the evaluation/UniGenBench.
export PYTHONPATH=.
accelerate launch scripts/evaluation/unigenbench.py \
--checkpoint /path/to/your/ckpt \
--output /path/to/save/results \
--data evaluation/UniGenBench/test_prompts_en.csv \
--height 512 \
--width 512 \
--seed 42
WISE
- Please download the benchmark data from WISE and place it in the evaluation/WISE.
export PYTHONPATH=.
accelerate launch scripts/evaluation/wise.py \
--checkpoint /path/to/your/ckpt \
--output /path/to/save/results \
--data evaluation/wise/data/spatio-temporal_reasoning.json \ # for spatio-temporal domain
--height 512 \
--width 512 \
--seed 42
T2I-CoREBench
- Please download the benchmark data from T2I-CoreBench and place it in the evaluation/T2I-CoReBench-main.
export PYTHONPATH=.
accelerate launch scripts/evaluation/corebench.py \
--checkpoint /path/to/your/ckpt \
--output /path/to/save/results \
--data evaluation/T2I-CoReBench-main/corebench.jsonl \
--height 512 \
--width 512 \
--seed 42
ImgEdit
- Please download the benchmark data from ImgEdit-Bench and place it in the evaluation/ImgEdit.
export PYTHONPATH=.
accelerate launch scripts/evaluation/img_edit.py \
--checkpoint /path/to/your/ckpt \
--output /path/to/save/results \
--data evaluation/ImgEdit/Benchmark/singleturn \
--height 512 \
--width 512 \
--seed 42
GEdit
- Please download the benchmark data from GEdit-Bench and place it in the evaluation/GEdit-Bench.
export PYTHONPATH=.
accelerate launch scripts/evaluation/gedit.py \
--checkpoint /path/to/your/ckpt \
--output /path/to/save/results \
--data evaluation/GEdit-Bench \
--height 512 \
--width 512 \
--seed 42
RISE
- Please download the benchmark data from RISE and place it in the evaluation/RISEBench-full.
export PYTHONPATH=.
accelerate launch scripts/evaluation/rise_bench.py \
--checkpoint /path/to/your/ckpt \
--output /path/to/save/results \
--data evaluation/RISEBench-full \
--height 512 \
--width 512 \
--seed 42
UniREditBench
- Please download the benchmark data from UniREditBench and place it in the evaluation/UniREditBench.
export PYTHONPATH=.
accelerate launch scripts/evaluation/unireditbench.py \
--checkpoint /path/to/your/ckpt \
--output /path/to/save/results \
--data evaluation/UniREditBench \
--height 512 \
--width 512 \
--seed 42
CVTG
- Please download the benchmark data from CVTG and place it in the evaluation/CVTG-2K.
export PYTHONPATH=.
accelerate launch scripts/evaluation/CVTG.py \
--checkpoint /path/to/your/ckpt \
--output /path/to/save/results \
--data evaluation/CVTG-2K \
--height 512 \
--width 512 \
--seed 42