๐ ๏ธ Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning
August 27, 2025 ยท View on GitHub
Daeun Lee*, Jaehong Yoon*, Jaemin Cho, Mohit Bansal
EMNLP 2025 Findings
๐
TL;DRVideo-Skill-CoT is a skill-aware CoT reasoning framework that constructs domain-specific multi-step rationales and trains expert modules for adaptive video understanding.
๐ง Setup
OpenAI/Gemini API Setup
Our Video-Skill-CoT is based on openai/gemini api, so you need to setup your Azure OpenAI/Gemini API config in the below files.
You can set your own API infomation in ./skill_cot_generation/config.ini.
[openai]
azure_endpoint = your endpoint
api_key = your key
api_version = your version
[gemini]
gemini_api_key = your gemini_api_key
gemini_application_credentials = your credentials
Download datasets
Please locate all downloaded datasets in the ./video_instruction_datasets directory. The data structure will like below:
./video_instruction_datasets
โโโ cinepile
โโโ ET_164k
โโโ VSI-Bench
๐ฉ Skill-CoT Generation
Based on above video understanding datasets, you can generate skill-cot as follows:
# [Step 1] Skill clustering
python ./skill_cot_generation/clustering.py --dataset='cine'
# [Step 2] Skill-CoT generation
python ./skill_cot_generation/skill_cot_generation.py --dataset='cine' --mode='skill_cot'
# [Step 3] Skill-CoT filtering
python ./skill_cot_generation/filtering.py --dataset='cine'
๐ TODO List
- Release Multi-LoRA training code
๐ BibTeX
๐ If you enjoy our Video-Skill-CoT and find some beneficial things, citing our paper would be the best support for us!
@article{lee2025videoskillcot,
title={Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning},
author={Lee, Daeun and Yoon, Jaehong and Cho, Jaemin and Bansal, Mohit},
journal={arXiv preprint arXiv:2506.03525},
year={2025}
}