StreamChat
March 14, 2025 Β· View on GitHub
Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025π
π₯ News
[2025.1] π₯ Release repo and test code.
[2025.2] π₯ Release StreamBench.
π© Approach
Motivation

1. Video agent with training-free and decoupled architecture.
2. Multi-round interaction with memory-enhanced knowledge during inference.
3. Achieving faster video processing speed.
Architecture

Selective Frame Stacking: reduce the redundant video frame feature storage.
Memory Formation: update memory and retrieve the related information as in-context.
Contextual Summarization: reorganize in-context as prompt for MLLM.
StreamBench
StreamBench is designed for the model performance evaluation in online videos.
It covers 4 key domains and 16 sub-class video types.
These videos exhibit a broader distribution of length, with 6 different types that are evenly distributed.
It consists of 6 kindsof questions (Object Search, Long-term Memory Search, Short-term Memory Search, Conversational Interaction, Knowledge-based Question Answering, and Simple Factual) to provide more comprehensive evaluation results.
πββοΈ Getting Started
You need at least 2x80G GPU to run.
Sorry for the terrible code, we are trying to solve it.
Preparation
Download StreamBench.
StreamBench_v0.3
βββ Ego
β βββ all_videos
βββ WebvVideo
βββ Movie
βββ streaming_bench_v0.3.json
Download LLaMA 3, LongVA and Embedding model weight.
Environment
git clone https://github.com/hmxiong/StreamChat.git
cd StreamChat
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
Inference, scouring and get results
# change model setting
Change the 'embedding_model_dict -> minilm-l6' path in memory_bank/memory_retrieval/configs/model_config.py
Change the 'embedding_model_id' in inference_streaming_longva_v2.py wih mxbai-colbert-large-v1 model save path.
Change the LLaMA3, LongVA model save path in inference_streamchat_v0.3.sh
All settings that need to be changed are marked with 'Your_xxxxx'.
# run script
bash inference_streamchat_v0.3.sh
You can change to parameters in the script and it takes about 28 hours to get results.
TODO:
- Test code.
- Data for StreaBench.
- Online Demo.
- Single GPU inference.
- Support more model.
π Citation
If you find this work helpful for your research, please consider citing our work.
@misc{xiong2025streamingvideounderstandingmultiround,
title={Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge},
author={Haomiao Xiong and Zongxin Yang and Jiazuo Yu and Yunzhi Zhuge and Lu Zhang and Jiawen Zhu and Huchuan Lu},
year={2025},
eprint={2501.13468},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2501.13468},
}
π€ Acknowledgement
StreamChat is built upon the following outstanding works: LongVA, LLaVA-NeXT, ChatUnivi, InternVL, MemoryBank, FreeVA, LLaVA-VID, Flash-VStream, Video-online. ThanksοΌ