🎬 Video Streaming Thinking

May 21, 2026 · View on GitHub

🎬 Video Streaming Thinking

VideoLLMs Can Watch and Think Simultaneously

Video Streaming Thinking introduces a new paradigm for streaming video understanding that interleaves active reasoning with continuous video consumption, enabling amortized test-time scaling with real-time responsiveness.

🔍 Overview

Existing online VideoLLMs focus on efficient streaming perception but lack explicit analytical reasoning. Offline VideoLLMs with Chain-of-Thought (CoT) can reason deeply, but incur high query-answer (QA) latency that violates real-time constraints. VST bridges this gap by shifting the LLM backend from passive waiting to active, intermittent reasoning during video consumption, implementing a thinking-while-watching mechanism inspired by human neural coupling.

https://github.com/user-attachments/assets/49846db5-bf76-4cf8-b923-4b9b88117482

✨ Key Idea

Instead of deferring all reasoning until a user query arrives, VST continuously processes incoming video clips and produces intermediate streaming thoughts in real time. This front-loads and amortizes the reasoning cost, so the final response is both deeply grounded and instantly available.

🏗️ Model Zoo

Model	HuggingFace	OVO-Bench	StreamingBench	VideoMME	LongVideoBench	VideoHolmes
VST-3B	🤗 Link	56.2	75.5	59.5	54.1	36.1
VST-7B	🤗 Link	59.3	79.5	64.9	58.0	41.9
VST-32B	🤗 Link	63.5	80.7	67.2	60.7	45.1

📦 Training Data

We release the full training data used for both SFT and RL stages on HuggingFace and ModelScope:

Dataset	HuggingFace	ModelScope	Description
vst_sft_data	🤗 Link	🤖 Link	SFT data including video-text pairs from multiple sources
vst_rl_data	🤗 Link	🤖 Link	RL data for reinforcement learning stage

📅 TODO

Release the paper.
Release checkpoint and eval code.
Release training code.
Release training data.

👍 Acknowledgement

We thank the following great works and open-source repositories:

📖 Citation

@article{guan2026videostreamingthinking,
      title={Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously}, 
      author={Yiran Guan and Liang Yin and Dingkang Liang and Jianzhong Ju and Zhenbo Luo and Jian Luan and Yuliang Liu and Xiang Bai},
      journal={arXiv preprint arXiv:2603.12262},
      year={2026},
}