PRISM Benchmark & FLUX-Reason-6M Dataset

January 29, 2026 Β· View on GitHub

PRISM Benchmark & FLUX-Reason-6M Dataset

[🌐 Homepage] [πŸ€— Huggingface Dataset] [πŸ“Š Leaderboard ] [πŸ“Š Leaderboard-ZH ] [πŸ“– Paper]

Rongyao Fang1*  Aldrich Yu1*  Chengqi Duan2*  Linjiang Huang3  Shuai Bai4 

Yuxuan Cai4  Kun Wang5  Si Liu3  Xihui Liu2†  Hongsheng Li1† 

1CUHK   2HKU   3BUAA   4Alibaba   5Sensetime  

*Equal Contribution  †Corresponding Author

πŸ“– Introduction

🌟 This is the official repository for the paper "FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark", which contains both evaluation code and data for the PRISM Benchmark.

Teaser

We introduce FLUX-Reason-6M and PRISM-Bench. FLUX-Reason-6M is a 6-million-scale synthesized dataset designed to incorporate reasoning capabilities into the architecture of T2I generation. PRISM-Bench serves as a comprehensive and discriminative benchmark with 7 independent tracks that closely align with human judgment.

πŸ’₯ News

  • [2026-01-26] πŸŽ‰ Our paper has been accepted to ICLR 2026!
  • [2025-09-12] Our FLUX-Reason-6M dataset is now accessible at huggingface.
  • [2025-09-12] Our paper is now accessible at ArXiv Paper.

πŸ“ˆ Evaluation

Data

Please organize the image data as follows.

└── images
β”‚Β Β  β”œβ”€β”€ imagination
β”‚   β”‚   β”œβ”€β”€ 0.png
β”‚   β”‚   β”œβ”€β”€ 1.png
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”‚   β”œβ”€β”€ 99.png
β”‚Β Β  β”œβ”€β”€ entity
β”‚Β Β  β”œβ”€β”€ text_rendering
β”‚Β Β  β”œβ”€β”€ style
β”‚Β Β  β”œβ”€β”€ affection
β”‚Β Β  β”œβ”€β”€ composition
β”‚Β Β  β”œβ”€β”€ long_text

PRISM-Bench Evaluation

Eval with GPT4.1:

python evaluation/eval_gpt41.py --image_path <path to image data> --api_key <OpenAI API key> --base_url <OpenAI base URL for custom or proxy endpoints>

Eval with Qwen2.5-VL-72B:

python evaluation/eval_qwen25.py --image_path <path to image data> --model_path <path to qwen model> --output_dir <path to save results>

PRISM-Bench-ZH Evaluation

Eval with GPT4.1:

python evaluation/eval_gpt41.py --image_path <path to image data> --api_key <OpenAI API key> --base_url <OpenAI base URL for custom or proxy endpoints> --zh

Eval with Qwen2.5-VL-72B:

python evaluation/eval_qwen25.py --image_path <path to image data> --model_path <path to qwen model> --output_dir <path to save results> --zh

πŸ“Š Benchmark

The leaderboard is available here.

PRISM-Bench(GPT4.1)
#ModelSourceDateOverall (Align)Overall (Aes)Overall (Avg)Imagination (Align)Imagination (Aes)Imagination (Avg)Entity (Align)Entity (Aes)Entity (Avg)Text rendering (Align)Text rendering (Aes)Text rendering (Avg)Style (Align)Style (Aes)Style (Avg)Affection (Align)Affection (Aes)Affection (Avg)Composition (Align)Composition (Aes)Composition (Avg)Long text (Align)Long text (Aes)Long text (Avg)
1GPT-Image-1 [High] πŸ₯‡Link2025-09-1086.985.686.386.286.686.490.086.388.268.880.174.592.893.393.190.790.990.896.289.492.883.872.878.3
2Gemini2.5-Flash-Image πŸ₯ˆLink2025-09-1087.183.485.392.484.888.687.081.384.265.274.169.790.590.890.796.088.292.192.588.590.585.976.281.1
3Qwen-Image πŸ₯‰Link2025-09-1081.178.679.980.578.679.679.373.276.354.368.961.684.588.786.691.689.190.493.786.990.383.865.174.5
4SEEDream 3.0Link2025-09-1080.578.779.677.376.476.980.273.877.056.170.263.283.987.485.789.390.389.893.386.389.883.266.775.0
5HiDream-I1-FullLink2025-09-1076.175.675.974.475.675.074.472.473.458.270.464.381.484.883.190.188.889.590.185.487.863.852.057.9
6FLUX.1-Krea-devLink2025-09-1074.375.174.771.573.072.369.567.568.547.561.354.480.883.582.284.090.387.290.985.888.476.264.170.2
7FLUX.1-devLink2025-09-1072.474.973.768.174.071.170.771.271.048.164.556.372.380.576.488.391.189.789.084.686.870.658.564.6
8SD3.5-LargeLink2025-09-1073.973.573.773.371.272.376.771.974.352.065.858.977.184.280.787.185.286.287.084.785.964.351.758.0
9HiDream-I1-DevLink2025-09-1070.370.070.268.269.769.072.067.069.553.464.158.868.778.673.784.283.183.787.679.883.758.147.552.8
10SD3.5-MediumLink2025-09-1070.168.969.569.573.071.372.863.768.333.350.141.777.480.378.984.985.585.289.479.284.363.350.556.9
11SD3-MediumLink2025-09-1065.665.265.461.065.663.364.856.360.632.853.143.074.875.675.278.780.379.585.579.182.361.546.153.8
12Bagel-CoTLink2025-09-1065.465.065.268.474.271.362.460.061.223.240.131.764.470.167.387.180.583.888.577.983.264.052.058.0
13BagelLink2025-09-1066.763.465.169.468.068.759.050.154.630.244.537.467.971.369.681.781.481.690.573.181.868.155.361.7
14FLUX.1-schnellLink2025-09-1067.161.264.263.366.264.861.851.256.546.254.150.268.670.169.475.469.972.785.167.576.369.449.759.6
15PlaygroundLink2025-09-1062.665.664.162.370.666.572.569.170.810.437.323.977.380.979.191.883.887.877.576.577.046.741.043.9
16JanusPro-7BLink2025-09-1064.257.260.770.465.868.167.151.959.515.536.726.171.473.872.679.271.575.483.761.072.462.439.751.1
17SDXLLink2025-09-1058.961.860.455.361.158.272.567.470.013.837.025.472.475.473.978.977.178.075.575.375.444.239.641.9
18SD2.1Link2025-09-1050.745.348.047.941.244.660.946.753.811.230.620.962.758.660.766.758.562.665.753.159.440.128.234.2
19SD1.5Link2025-09-1044.943.544.236.636.136.453.841.147.58.033.120.655.355.355.364.457.561.061.151.056.135.330.432.9
PRISM-Bench (Qwen2.5-VL)
#ModelSourceDateOverall (Align)Overall (Aes)Overall (Avg)Imagination (Align)Imagination (Aes)Imagination (Avg)Entity (Align)Entity (Aes)Entity (Avg)Text rendering (Align)Text rendering (Aes)Text rendering (Avg)Style (Align)Style (Aes)Style (Avg)Affection (Align)Affection (Aes)Affection (Avg)Composition (Align)Composition (Aes)Composition (Avg)Long text (Align)Long text (Aes)Long text (Avg)
1GPT-Image-1 [High] πŸ₯‡Link2025-09-1082.778.780.779.853.366.687.381.084.166.786.876.887.387.887.588.179.884.092.284.988.577.277.577.4
2Gemini2.5-Flash-Image πŸ₯ˆLink2025-09-1085.075.880.484.738.161.486.076.781.372.884.378.589.587.888.694.374.884.591.288.289.776.380.678.4
3SEEDream 3.0 πŸ₯‰Link2025-09-1080.172.376.275.838.056.981.374.277.758.874.066.484.484.184.290.574.682.593.685.189.376.276.476.3
4Qwen-ImageLink2025-09-1080.068.374.175.537.456.579.564.572.057.971.264.586.684.485.589.970.480.193.979.586.776.870.973.8
5FLUX.1-Krea-devLink2025-09-1074.473.774.069.643.156.372.270.771.451.776.163.980.086.683.382.678.780.690.887.188.973.673.473.5
6HiDream-I1-FullLink2025-09-1076.668.672.673.044.058.576.372.874.560.576.468.481.481.581.490.076.683.388.580.384.466.348.657.4
7SD3.5-LargeLink2025-09-1073.467.870.666.743.455.076.872.774.853.673.163.377.378.277.785.673.979.787.880.984.365.852.259.0
8HiDream-I1-DevLink2025-09-1072.367.069.668.845.857.373.568.170.856.775.766.270.277.473.888.274.381.284.778.581.664.049.356.6
9FLUX.1-devLink2025-09-1072.164.968.565.542.954.270.661.966.252.373.062.672.674.273.486.072.979.487.475.881.670.553.862.1
10SD3.5-MediumLink2025-09-1068.665.166.865.134.749.972.570.971.736.664.550.575.580.077.781.873.977.985.481.083.263.550.657.0
11SD3-MediumLink2025-09-1068.064.266.164.337.751.069.463.366.338.563.350.974.679.577.080.575.578.085.679.582.563.450.356.8
12FLUX.1-schnellLink2025-09-1068.361.164.762.835.649.264.856.860.854.368.161.270.371.570.975.465.970.681.775.678.668.754.461.5
13JanusPro-7BLink2025-09-1064.959.462.165.038.851.968.663.566.023.150.336.770.775.272.980.768.074.382.471.176.763.949.056.4
14Bagel-CoTLink2025-09-1067.556.562.068.044.156.067.653.460.529.442.335.869.069.769.387.166.776.986.669.277.964.550.257.3
15BagelLink2025-09-1067.556.662.068.045.056.567.653.460.529.442.335.869.069.769.387.166.776.986.669.277.964.550.257.3
16PlaygroundLink2025-09-1062.252.157.159.039.049.069.456.763.015.331.923.674.674.674.688.866.077.472.261.366.756.035.345.6
17SDXLLink2025-09-1060.154.057.054.534.144.371.165.068.018.637.327.971.772.672.178.766.572.672.267.870.054.134.544.3
18SD2.1Link2025-09-1054.047.750.848.928.438.666.057.661.816.731.424.062.766.564.668.562.165.364.858.361.550.729.840.2
19SD1.5Link2025-09-1048.843.346.040.723.732.261.252.756.911.424.117.856.761.559.166.960.763.857.553.455.447.326.837.0
PRISM-Bench-ZH (GPT4.1)
#ModelSourceDateOverall (Align)Overall (Aes)Overall (Avg)Imagination (Align)Imagination (Aes)Imagination (Avg)Entity (Align)Entity (Aes)Entity (Avg)Text rendering (Align)Text rendering (Aes)Text rendering (Avg)Style (Align)Style (Aes)Style (Avg)Affection (Align)Affection (Aes)Affection (Avg)Composition (Align)Composition (Aes)Composition (Avg)Long text (Align)Long text (Aes)Long text (Avg)
1GPT-Image-1 [High] πŸ₯‡Link2025-09-1087.787.287.588.890.489.685.992.489.283.967.775.893.991.792.891.586.589.092.497.394.977.284.380.8
2SEEDream 3.0 πŸ₯ˆLink2025-09-1081.982.082.077.277.877.577.678.678.179.771.975.887.883.285.588.785.186.987.794.491.174.382.778.5
3Qwen-Image πŸ₯‰Link2025-09-1080.881.381.180.179.679.975.679.777.776.962.969.990.284.387.387.484.986.286.693.490.068.984.276.6
4BagelLink2025-09-1065.565.265.472.864.768.853.962.258.149.229.039.173.968.471.281.473.577.569.089.879.458.168.763.4
5Bagel-CoTLink2025-09-1064.462.463.475.169.372.253.358.856.142.616.329.573.666.670.181.278.079.674.083.678.850.764.357.5
6HiDream-I1-FullLink2025-09-1060.854.957.953.647.350.563.160.862.034.616.325.574.165.569.880.967.374.173.876.175.045.450.848.1
7HiDream-I1-DevLink2025-09-1055.048.351.747.341.144.252.849.050.935.214.524.964.552.458.576.366.571.467.668.368.041.146.443.8
PRISM-Bench-ZH (Qwen2.5-VL)
#ModelSourceDateOverall (Align)Overall (Aes)Overall (Avg)Imagination (Align)Imagination (Aes)Imagination (Avg)Entity (Align)Entity (Aes)Entity (Avg)Text rendering (Align)Text rendering (Aes)Text rendering (Avg)Style (Align)Style (Aes)Style (Avg)Affection (Align)Affection (Aes)Affection (Avg)Composition (Align)Composition (Aes)Composition (Avg)Long text (Align)Long text (Aes)Long text (Avg)
1GPT-Image-1 [High] πŸ₯‡Link2025-09-1078.077.477.773.037.655.380.482.181.373.189.981.577.192.484.878.077.877.991.985.788.872.476.374.4
2SEEDream 3.0 πŸ₯ˆLink2025-09-1076.273.274.771.436.654.074.873.874.370.788.079.474.188.081.179.071.475.290.383.286.873.071.272.1
3Qwen-Image πŸ₯‰Link2025-09-1075.065.570.371.429.950.774.767.871.364.373.168.775.283.279.277.364.570.989.874.182.072.665.869.2
4Bagel-CoTLink2025-09-1062.057.459.764.436.650.562.653.858.225.251.938.665.476.771.174.065.069.581.371.376.361.446.654.0
5BagelLink2025-09-1061.554.357.964.636.350.562.755.559.118.626.322.566.076.671.374.966.270.681.372.276.862.447.354.9
6HiDream-I1-FullLink2025-09-1055.955.355.651.230.841.060.161.360.720.740.630.764.573.869.265.269.167.272.469.070.757.142.850.0
7HiDream-I1-DevLink2025-09-1052.249.750.948.324.636.552.654.153.418.635.327.059.068.363.765.962.364.166.564.665.654.238.646.4

πŸ“ Citation

If you find this work helpful, please consider citing:

@article{fang2025flux,
      title={FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark}, 
      author={Fang, Rongyao and Yu, Aldrich and Duan, Chengqi and Huang, Linjiang and Bai, Shuai and Cai, Yuxuan and Wang, Kun and Liu, Si and Liu, Xihui and Li, Hongsheng},
      journal={arXiv preprint arXiv:2509.09680},
      year={2025}
}