README.md

February 25, 2026 · View on GitHub

🌊 Accelerating Streaming Video Large Language Models via Hierarchical Token Compression 🚀

Yiyu Wang^1, Xuyang Liu^1,2†, Xiyan Gui^1,3, Xinying Lin⁴, Boxue Yang¹,
Chenfei Liao^1,5, Tailai Chen¹, Linfeng Zhang^1✉

¹ EPIC Lab, Shanghai Jiao Tong University ² Sichuan University
³ Huazhong University of Science and Technology ⁴ Sun Yat-sen University
⁵ Hong Kong University of Science and Technology (Guangzhou)

⚡ The first plug-and-play token compression framework for streaming video understanding.

🔥 News

2026.02.21 🎊🎊 Our STC has been accepted by CVPR 2026! The codebase is under comprehensive cleanup. Stay tuned!
2025.12.02 🤗🤗 We release our latest work STC, the first plug-and-play inference acceleration framework for streaming video understanding! Code is available!
2025.08.21 🎉🎉 Our VidCom² has been accepted by EMNLP 2025 main conference!
2025.05.21 🤗🤗 We release VidCom², a plug-and-play inference acceleration method of VideoLLMs. Code is available!

📌 Highlights

STC is the first token compression framework for plug-and-play acceleration for streaming video understanding:

⚡ Streaming-First Design: Optimized for latency-sensitive applications (e.g., live sports, AR glasses) where frames arrive continuously.
🧩 STC-Cacher : Exploits temporal redundancy by caching visual features for similar frames (Cosine Similarity $> 0.85$ ), significantly reducing ViT encoding overhead.
✂️ STC-Pruner: Compresses visual tokens after encoding to shorten the LLM prefill sequence while preserving spatiotemporal saliency.
🔌 Plug-and-Play: Seamlessly integrates with SOTA VideoLLMs like ReKV, Dispider, StreamForest, and Livecc.

🦁 Core Codes

Core Implementation:

Cache Logic: model/cache.py (Class: STC_CACHE)
Prune Logic: model/prune.py (Class: STC_Pruner)

🛠 Preparation

We support the following models enhanced with STC. Code is coming soon.

Model Base	Status	Code Path
ReKV (LLaVA-OV)	✅ Supported	`model/llava_onevision_rekv.py`
StreamForest	🚧 Coming Soon	-
Dispider	🚧 Coming Soon	-
LiveCC	🚧 Coming Soon	-

Environment Settings

Original Models (recommended)

We evaluated our model under the same environments as the original models. So you may set the environments through following the requirements of the mentioned original models.

Links:

Original Models	urls
ReKV	https://github.com/Becomebright/ReKV
StreamForest	https://github.com/MCG-NJU/StreamForest
Dispider	https://github.com/Mark12Ding/Dispider
LiveCC	https://github.com/showlab/livecc

Besides, we provide a replica for our environment here:

Use our environment

ReKV

cd ReKV
pip install -e .
cd model/longva
pip install -e .

StreamForest

cd StreamForest
conda env create -f environment-StreamForest.yml

Dispider

cd Dispider
conda env create -f environment-Dispider.yml
pip install -v . # for development mode, `pip install -v -e .`

LiveCC

cd LiveCC
conda env create -f environment-LiveCC.yml
pip install -v . # for development mode, `pip install -v -e .`

🚀 Performance Evaluation

We evaluate STC on both Online (Streaming) benchmarks to demonstrate real-time capabilities and Offline benchmarks to ensure robust general video understanding.

🌊 Online Benchmarks (Streaming)

These benchmarks evaluate the model's ability to understand videos in a streaming fashion, where frames are received sequentially.

1. StreamingBench

Download the dataset from mjuicem/StreamingBench.

Required files: Real_Time_Visual_Understanding.csv and Real-Time Visual Understanding_*.zip.

2. OVO-Bench

Videos: Download src_videos.tar.parta[a-e] from JoeLeelyf/OVO-Bench (HF).
Metadata: Download ovo_bench_new.json from JoeLeelyf/OVO-Bench (Github).

💾 Offline Benchmarks (Standard)

Supported Datasets: MLVU, EgoSchema, Videomme

We use standard benchmarks to verify that STC maintains high performance on general video understanding tasks.

Download benchmarks under data/

Run ReKV

`MLVU`, `EgoSchema`, `Videomme`

# Example: Evaluating on MLVU
bash scripts/eval_offline_benchs.sh

To evaluate egoschema or videomme, simply change the DATASET argument to the respective dataset name.

`OVO-Bench`

Configuration: Update eval/scripts/eval_ovobench.sh:
- Set TASK_JSON to the path of ovo_bench_new.json.
- Set VIDEO_DIR to the unzipped video directory.

bash scripts/ovobench_scipts/eval_rekv.sh

Then you can use the generated result file mentioned above to calculate the indicators.

bash scripts/ovobench_scipts/score_rekv.sh

`StreamingBench`

Configuration: Update eval/scripts/eval_streamingbench.sh:
- Set TASK_CSV to the path of the CSV file.
- Set VIDEO_DIR to the unzipped video directory.

bash scripts/streamingbench_scripts/eval_rekv.sh

Then you can use the generated result file mentioned above to calculate the indicators.

bash scripts/streamingbench_scripts/score_rekv.sh

Run StreamForest

`MLVU`, `EgoSchema`, `Videomme`,`OVO-Bench`,`StreamingBench`

TODO

Run Dispider

`OVO-Bench`

TODO

Run LiveCC

`OVO-Bench`

TODO

👍 Acknowledgment

Thanks to ReKV for their great work and codebase.
Thanks to StreamForest for their great work and codebase.
Thanks to Dispider for their great work and codebase.
Thanks to LiveCC for their great work and codebase.

✏️ Citation

Please consider citing our paper in your publications, if our findings help your research.

@article{wang2025stc,
  title={Accelerating Streaming Video Large Language Models via Hierarchical Token Compression},
  author={Wang, Yiyu and Liu, Xuyang and Gui, Xiyan and Lin, Xinying and Yang, Boxue and Liao, Chenfei and Chen, Tailai and Zhang, Linfeng},
  journal={arXiv preprint arXiv:2512.00891},
  year={2025}
}

📩 Contact

For any question about our paper or code, please email liuxuyang@stu.scu.edu.cn or ustywan8@ljmu.ac.uk.