README.md
February 25, 2026 ยท View on GitHub
๐ Accelerating Streaming Video Large Language Models via Hierarchical Token Compression ๐
Yiyu Wang1*, Xuyang Liu1,2*โ , Xiyan Gui1,3, Xinying Lin4, Boxue Yang1,
Chenfei Liao1,5, Tailai Chen1, Linfeng Zhang1โ
1 EPIC Lab, Shanghai Jiao Tong University โ 2 Sichuan University
3 Huazhong University of Science and Technology โ 4 Sun Yat-sen University
5 Hong Kong University of Science and Technology (Guangzhou)
โก The first plug-and-play token compression framework for streaming video understanding.
๐ฅ News
2026.02.21๐๐ Our STC has been accepted by CVPR 2026! The codebase is under comprehensive cleanup. Stay tuned!2025.12.02๐ค๐ค We release our latest work STC, the first plug-and-play inference acceleration framework for streaming video understanding! Code is available!2025.08.21๐๐ Our VidCom2 has been accepted by EMNLP 2025 main conference!2025.05.21๐ค๐ค We release VidCom2, a plug-and-play inference acceleration method of VideoLLMs. Code is available!
๐ Highlights
STC is the first token compression framework for plug-and-play acceleration for streaming video understanding:
- โก Streaming-First Design: Optimized for latency-sensitive applications (e.g., live sports, AR glasses) where frames arrive continuously.
- ๐งฉ STC-Cacher : Exploits temporal redundancy by caching visual features for similar frames (Cosine Similarity ), significantly reducing ViT encoding overhead.
- โ๏ธ STC-Pruner: Compresses visual tokens after encoding to shorten the LLM prefill sequence while preserving spatiotemporal saliency.
- ๐ Plug-and-Play: Seamlessly integrates with SOTA VideoLLMs like ReKV, Dispider, StreamForest, and Livecc.
๐ฆ Core Codes
Core Implementation:
- Cache Logic:
model/cache.py(Class:STC_CACHE) - Prune Logic:
model/prune.py(Class:STC_Pruner)
๐ Preparation
We support the following models enhanced with STC. Code is coming soon.
| Model Base | Status | Code Path |
|---|---|---|
| ReKV (LLaVA-OV) | โ Supported | model/llava_onevision_rekv.py |
| StreamForest | ๐ง Coming Soon | - |
| Dispider | ๐ง Coming Soon | - |
| LiveCC | ๐ง Coming Soon | - |
Environment Settings
Original Models (recommended)
We evaluated our model under the same environments as the original models. So you may set the environments through following the requirements of the mentioned original models.
Links:
| Original Models | urls |
|---|---|
| ReKV | https://github.com/Becomebright/ReKV |
| StreamForest | https://github.com/MCG-NJU/StreamForest |
| Dispider | https://github.com/Mark12Ding/Dispider |
| LiveCC | https://github.com/showlab/livecc |
Besides, we provide a replica for our environment here:
Use our environment
ReKV
cd ReKV
pip install -e .
cd model/longva
pip install -e .
StreamForest
cd StreamForest
conda env create -f environment-StreamForest.yml
Dispider
cd Dispider
conda env create -f environment-Dispider.yml
pip install -v . # for development mode, `pip install -v -e .`
LiveCC
cd LiveCC
conda env create -f environment-LiveCC.yml
pip install -v . # for development mode, `pip install -v -e .`
๐ Performance Evaluation
We evaluate STC on both Online (Streaming) benchmarks to demonstrate real-time capabilities and Offline benchmarks to ensure robust general video understanding.
๐ Online Benchmarks (Streaming)
These benchmarks evaluate the model's ability to understand videos in a streaming fashion, where frames are received sequentially.
1. StreamingBench
Download the dataset from mjuicem/StreamingBench.
- Required files:
Real_Time_Visual_Understanding.csvandReal-Time Visual Understanding_*.zip.
2. OVO-Bench
- Videos: Download
src_videos.tar.parta[a-e]from JoeLeelyf/OVO-Bench (HF). - Metadata: Download
ovo_bench_new.jsonfrom JoeLeelyf/OVO-Bench (Github).
๐พ Offline Benchmarks (Standard)
Supported Datasets: MLVU, EgoSchema, Videomme
We use standard benchmarks to verify that STC maintains high performance on general video understanding tasks.
Run ReKV
MLVU, EgoSchema, Videomme
# Example: Evaluating on MLVU
bash scripts/eval_offline_benchs.sh
To evaluate egoschema or videomme, simply change the DATASET argument to the respective dataset name.
OVO-Bench
- Configuration: Update
eval/scripts/eval_ovobench.sh:- Set
TASK_JSONto the path ofovo_bench_new.json. - Set
VIDEO_DIRto the unzipped video directory.
- Set
bash scripts/ovobench_scipts/eval_rekv.sh
Then you can use the generated result file mentioned above to calculate the indicators.
bash scripts/ovobench_scipts/score_rekv.sh
StreamingBench
- Configuration: Update
eval/scripts/eval_streamingbench.sh:- Set
TASK_CSVto the path of the CSV file. - Set
VIDEO_DIRto the unzipped video directory.
- Set
bash scripts/streamingbench_scripts/eval_rekv.sh
Then you can use the generated result file mentioned above to calculate the indicators.
bash scripts/streamingbench_scripts/score_rekv.sh
Run StreamForest
MLVU, EgoSchema, Videomme,OVO-Bench,StreamingBench
TODO
Run Dispider
OVO-Bench
TODO
Run LiveCC
OVO-Bench
TODO
๐ Acknowledgment
- Thanks to ReKV for their great work and codebase.
- Thanks to StreamForest for their great work and codebase.
- Thanks to Dispider for their great work and codebase.
- Thanks to LiveCC for their great work and codebase.
โ๏ธ Citation
Please consider citing our paper in your publications, if our findings help your research.
@article{wang2025stc,
title={Accelerating Streaming Video Large Language Models via Hierarchical Token Compression},
author={Wang, Yiyu and Liu, Xuyang and Gui, Xiyan and Lin, Xinying and Yang, Boxue and Liao, Chenfei and Chen, Tailai and Zhang, Linfeng},
journal={arXiv preprint arXiv:2512.00891},
year={2025}
}
๐ฉ Contact
For any question about our paper or code, please email liuxuyang@stu.scu.edu.cn or ustywan8@ljmu.ac.uk.