StreamingCoT
July 7, 2025 ยท View on GitHub
Overview
StreamingCoT is the first dataset explicitly designed for temporally evolving reasoning in streaming Video Question Answering (VideoQA) and multimodal Chain-of-Thought (CoT) tasks. Addressing critical limitations in current VideoQA benchmarks, StreamingCoT features:
- Dynamic temporal understanding: Captures evolving answers in video streams
- Explicit reasoning chains: Provides annotated multimodal reasoning paths
- Temporal dependency modeling: Tracks semantic evolution across video timelines
- Spatiotemporal grounding: Links reasoning steps to visual evidence
This dataset establishes a new foundation for research in streaming video understanding, complex temporal reasoning, and multimodal inference.
Key Features
๐ฅ Curated Video Corpus
- 5,745 high-quality short videos (โค60 seconds)
- Global representation through stratified geographic sampling
- Rigorous multimodal filtering:
- Social validation (>5,000 interactions)
- Lexical density constraints
- HD resolution (โฅ720p)
- Motion dynamics analysis
- Aesthetic scoring (โฅ7/10)
โฑ๏ธ Hierarchical Temporal Annotation
- Per-second dense captions aligned with visual content
- Adaptive temporal segmentation via Dynamic Semantic Fusion (DSF)
- Context-aware narration generation with inter-segment coherence
- Expert-validated semantic completeness and temporal alignment
โ Dynamic QA Construction
- 6 specialized question types:
- Cumulative counting
- Periodic pattern recognition
- Sequential step recognition
- State duration measurement
- Object state recognition
- Clue-revealing responses
- Distractor-aware option design targeting temporal misperceptions
- Human-verified temporal consistency and answer validity
๐ง Multimodal Chain-of-Thought
- Spatiotemporally grounded reasoning chains:
- Temporally-aware CoT initialization
- Key object extraction and spatial grounding
- Multimodal reasoning fusion
- Iterative human validation protocol ensuring:
- Spatiotemporal consistency
- Temporal causality
- Evidence completeness
- Answer derivation soundness
Dataset Structure
StreamingCoT/
โโโ bbox/ # Per-second bounding box annotations
โ โโโ VIDEO_ID/ # Directory per video (YouTube ID)
โ โโโ sec_0_idx_48.json # Bounding boxes at second 0 (frame 48)
โ โโโ sec_1_idx_17.json # Second 1 annotations
โ โโโ ...
โโโ final_cot/ # Verified reasoning chains
โ โโโ VIDEO_ID.jsonl # Final CoT in JSON Lines format
โ โโโ ...
โโโ initial_cot/ # Preliminary reasoning chains
โ โโโ VIDEO_ID.jsonl # Initial CoT annotations
โ โโโ ...
โโโ key_frames/ # Temporally significant frames
โโโ VIDEO_ID/ # Directory per video
โโโ metadata.json # Key frame positions and features
Construction Pipeline
Our hierarchical annotation framework:
- Video Collection & Filtering
YouTube API โ Geographic balancing โ Multimodal quality screening - Hierarchical Captioning
Per-second captioning โ Dynamic segmentation โ Context-aware narration - Dynamic QA Generation
Question typing โ Distractor design โ Temporal realignment - Multimodal CoT Synthesis
Keyframe selection โ Object grounding โ Reasoning fusion - Iterative Validation
Expert verification โ Error taxonomy โ Corrective regeneration
Applications
StreamingCoT enables research in:
- Temporal reasoning in video understanding
- Multimodal chain-of-thought development
- Streaming video question answering
- Spatiotemporally grounded inference
- Dynamic distractor analysis
- Video-based logical deduction systems
Access
The StreamingCoT dataset and construction toolkit are available at:
https://anonymous.4open.science/
License
StreamingCoT is released for non-commercial research purposes. All videos are sourced from YouTube and remain subject to original content creators' rights. Users must comply with YouTube's Terms of Service.
Citation
@article{streamingcot2024,
title={StreamingCoT: Advancing Temporal Reasoning in VideoQA through Dynamic Multimodal Chain-of-Thought},
author={Anonymous},
journal={Submitted to Preprint},
year={2024},
note={Dataset available at \url{https://anonymous.4open.science/}}
}
Contact
For dataset inquiries, please open an issue on our repository or contact the maintainers through the anonymous submission portal.