VideoEspresso

July 28, 2025 ยท View on GitHub

[Paper] [Test Set] [Train Set]

News:

[2025/4/12] ๐Ÿ”ฅ We fixed the video version training annotation.

[2025/4/4] ๐Ÿ”ฅ This paper was accepted as an oral presentation at CVPR'25!

[2025/3/29] ๐Ÿ”ฅ The training set (video version) has been updated! [Train Set (video)]

[2025/3/24] ๐Ÿ”ฅ The training set (multi-image version) has been updated! [Train Set (multi-image)]

[2025/2/27] ๐Ÿ”ฅ This paper has been accepted by CVPR'25!

[2025/1/16] ๐Ÿ”ฅ The close-ended Leaderboard has been updated!

[2024/12/17] ๐Ÿ”ฅ The close-ended benchmark has been updated! [Close-Ended Evaluation]

[2024/12/16] ๐Ÿ”ฅ The test set has been released! Please check our huggingface repo. [Test Set]

Leaderboard

ModelParamsFramesOverallNarrative AnalysisEvent DynamicPreparation StepsCausal AnalysisTheme AnalysisContextual AnalysisInfluence AnalysisRole AnalysisInteraction AnalysisBehavior AnalysisEmotion AnalysisCooking ProcessTraffic AnalysisSituation Analysis
LLaVA-Video72B6466.3%68.4%66.2%74.5%62.7%62.3%71.6%62.5%63.5%67.7%63.2%60.0%75.5%76.7%74.0%
LLaVA-OneVision72B6463.2%76.0%61.8%71.4%57.5%62.3%68.8%62.5%55.6%58.1%56.1%63.1%77.4%70.0%74.0%
InternVL2.538B1659.9%65.8%54.1%66.3%57.3%55.7%63.3%56.9%54.0%53.2%63.2%60.0%73.6%70.0%72.0%
gemini-1.5-pro-12844.2%55.7%42.0%50.0%41.3%34.4%53.2%29.2%39.7%40.3%38.6%47.7%58.5%50.0%54.0%
Kangaroo8B6444.1%41.8%43.3%49.0%42.7%34.4%44.0%61.1%52.4%41.9%33.3%38.5%52.8%53.3%38.0%
Qwen-Max-442.7%44.3%35.7%45.9%39.7%44.3%54.1%43.1%47.6%35.5%45.6%41.5%49.1%46.7%46.0%
gemini-1.5-flash-12839.8%59.5%45.2%38.8%34.7%32.8%45.9%30.6%42.9%43.6%33.3%38.5%41.5%36.7%46.0%
LongVA7B12839.7%40.5%33.8%43.9%35.9%42.6%42.2%51.4%47.6%40.3%35.1%32.3%39.6%56.7%48.0%
Qwen-VL-Chat7B2436.2%49.4%28.7%35.7%32.4%44.3%39.5%47.2%31.8%30.7%40.4%36.9%34.0%43.3%44.0%
VideoChat2-Mistral7B1632.1%31.7%28.7%27.6%34.3%36.1%27.5%31.9%31.8%43.6%28.1%38.5%20.8%36.7%30.0%
Chat-UniVi-v1.57B6425.5%24.1%22.9%21.4%24.2%27.9%30.3%30.6%25.4%27.4%22.8%30.8%18.9%36.7%28.0%
SliME8B6424.8%19.0%24.2%26.5%27.0%19.7%21.1%30.6%28.6%29.0%19.3%21.5%30.2%20.0%16.0%
Video-XL7B6424.6%25.3%28.0%22.5%26.5%23.0%21.1%26.4%20.6%27.4%28.1%18.5%13.2%36.7%18.0%
Long-LLava7B6413.8%8.9%16.6%19.4%13.9%16.4%12.8%13.9%14.3%12.9%1.8%29.2%7.6%3.3%8.0%
ShareGPT4Video8B168.0%8.9%10.8%12.2%8.0%11.5%8.3%6.9%7.9%8.1%0.0%7.7%3.8%3.3%4.0%

How You Can Participate:

  • Use our benchmark: Feel free to test your models using our benchmark and share your results.
  • Submit checkpoints: Alternatively, you can provide your model checkpoints, and we will evaluate them and update the leaderboard for you.

We look forward to your participation and contributions! ๐ŸŒŸ

Overall View:

Contact Us ๐Ÿ“ง
If you have any questions or want to submit your checkpoints, feel free to reach out to us via email:

Citation:

@InProceedings{Han_2025_CVPR,
    author    = {Han, Songhao and Huang, Wei and Shi, Hairong and Zhuo, Le and Su, Xiu and Zhang, Shifeng and Zhou, Xu and Qi, Xiaojuan and Liao, Yue and Liu, Si},
    title     = {VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {26181-26191}
}

Star History

Star History Chart