README.md

February 25, 2026 ยท View on GitHub

๐ŸŒŠ Accelerating Streaming Video Large Language Models via Hierarchical Token Compression ๐Ÿš€

Yiyu Wang1*, Xuyang Liu1,2*โ€ , Xiyan Gui1,3, Xinying Lin4, Boxue Yang1,
Chenfei Liao1,5, Tailai Chen1, Linfeng Zhang1โœ‰

1 EPIC Lab, Shanghai Jiao Tong University โ€ƒ 2 Sichuan University
3 Huazhong University of Science and Technology โ€ƒ 4 Sun Yat-sen University
5 Hong Kong University of Science and Technology (Guangzhou)

โšก The first plug-and-play token compression framework for streaming video understanding.

arXiv CVPR PR Stars

๐Ÿ”ฅ News

  • 2026.02.21 ๐ŸŽŠ๐ŸŽŠ Our STC has been accepted by CVPR 2026! The codebase is under comprehensive cleanup. Stay tuned!
  • 2025.12.02 ๐Ÿค—๐Ÿค— We release our latest work STC, the first plug-and-play inference acceleration framework for streaming video understanding! Code is available!
  • 2025.08.21 ๐ŸŽ‰๐ŸŽ‰ Our VidCom2 has been accepted by EMNLP 2025 main conference!
  • 2025.05.21 ๐Ÿค—๐Ÿค— We release VidCom2, a plug-and-play inference acceleration method of VideoLLMs. Code is available!

๐Ÿ“Œ Highlights

STC is the first token compression framework for plug-and-play acceleration for streaming video understanding:

  • โšก Streaming-First Design: Optimized for latency-sensitive applications (e.g., live sports, AR glasses) where frames arrive continuously.
  • ๐Ÿงฉ STC-Cacher : Exploits temporal redundancy by caching visual features for similar frames (Cosine Similarity >0.85> 0.85), significantly reducing ViT encoding overhead.
  • โœ‚๏ธ STC-Pruner: Compresses visual tokens after encoding to shorten the LLM prefill sequence while preserving spatiotemporal saliency.
  • ๐Ÿ”Œ Plug-and-Play: Seamlessly integrates with SOTA VideoLLMs like ReKV, Dispider, StreamForest, and Livecc.

๐Ÿฆ Core Codes

Core Implementation:

๐Ÿ›  Preparation

We support the following models enhanced with STC. Code is coming soon.

Model BaseStatusCode Path
ReKV (LLaVA-OV)โœ… Supportedmodel/llava_onevision_rekv.py
StreamForest๐Ÿšง Coming Soon-
Dispider๐Ÿšง Coming Soon-
LiveCC๐Ÿšง Coming Soon-

Environment Settings

We evaluated our model under the same environments as the original models. So you may set the environments through following the requirements of the mentioned original models.

Links:

Original Modelsurls
ReKVhttps://github.com/Becomebright/ReKV
StreamForesthttps://github.com/MCG-NJU/StreamForest
Dispiderhttps://github.com/Mark12Ding/Dispider
LiveCChttps://github.com/showlab/livecc

Besides, we provide a replica for our environment here:

Use our environment
ReKV
cd ReKV
pip install -e .
cd model/longva
pip install -e .
StreamForest
cd StreamForest
conda env create -f environment-StreamForest.yml
Dispider
cd Dispider
conda env create -f environment-Dispider.yml
pip install -v . # for development mode, `pip install -v -e .`
LiveCC
cd LiveCC
conda env create -f environment-LiveCC.yml
pip install -v . # for development mode, `pip install -v -e .`

๐Ÿš€ Performance Evaluation

We evaluate STC on both Online (Streaming) benchmarks to demonstrate real-time capabilities and Offline benchmarks to ensure robust general video understanding.

๐ŸŒŠ Online Benchmarks (Streaming)

These benchmarks evaluate the model's ability to understand videos in a streaming fashion, where frames are received sequentially.

1. StreamingBench

Download the dataset from mjuicem/StreamingBench.

  • Required files: Real_Time_Visual_Understanding.csv and Real-Time Visual Understanding_*.zip.

2. OVO-Bench


๐Ÿ’พ Offline Benchmarks (Standard)

Supported Datasets: MLVU, EgoSchema, Videomme

We use standard benchmarks to verify that STC maintains high performance on general video understanding tasks.

Run ReKV

MLVU, EgoSchema, Videomme

# Example: Evaluating on MLVU
bash scripts/eval_offline_benchs.sh

To evaluate egoschema or videomme, simply change the DATASET argument to the respective dataset name.

OVO-Bench

  • Configuration: Update eval/scripts/eval_ovobench.sh:
    • Set TASK_JSON to the path of ovo_bench_new.json.
    • Set VIDEO_DIR to the unzipped video directory.
bash scripts/ovobench_scipts/eval_rekv.sh

Then you can use the generated result file mentioned above to calculate the indicators.

bash scripts/ovobench_scipts/score_rekv.sh

StreamingBench

  • Configuration: Update eval/scripts/eval_streamingbench.sh:
    • Set TASK_CSV to the path of the CSV file.
    • Set VIDEO_DIR to the unzipped video directory.
bash scripts/streamingbench_scripts/eval_rekv.sh

Then you can use the generated result file mentioned above to calculate the indicators.

bash scripts/streamingbench_scripts/score_rekv.sh

Run StreamForest

MLVU, EgoSchema, Videomme,OVO-Bench,StreamingBench

TODO

Run Dispider

OVO-Bench

TODO

Run LiveCC

OVO-Bench

TODO

๐Ÿ‘ Acknowledgment

  • Thanks to ReKV for their great work and codebase.
  • Thanks to StreamForest for their great work and codebase.
  • Thanks to Dispider for their great work and codebase.
  • Thanks to LiveCC for their great work and codebase.

โœ๏ธ Citation

Please consider citing our paper in your publications, if our findings help your research.

@article{wang2025stc,
  title={Accelerating Streaming Video Large Language Models via Hierarchical Token Compression},
  author={Wang, Yiyu and Liu, Xuyang and Gui, Xiyan and Lin, Xinying and Yang, Boxue and Liao, Chenfei and Chen, Tailai and Zhang, Linfeng},
  journal={arXiv preprint arXiv:2512.00891},
  year={2025}
}

๐Ÿ“ฉ Contact

For any question about our paper or code, please email liuxuyang@stu.scu.edu.cn or ustywan8@ljmu.ac.uk.