MR. Video: "MapReduce" is the Principle for Long Video Understanding [NeurIPS 2025]
June 18, 2026 ยท View on GitHub
MR. Video is an agentic long-video understanding framework built around a MapReduce pattern: dense short-clip perception followed by global aggregation and reasoning. This repository contains the released code for running the question-answering agent pipeline from precomputed MR. Video captions.
Code Layout
configs/ Model/provider configuration.
scripts/ Dataset runners for LVBench, EgoSchema, and VideoMME.
eval/ Evaluation scripts for saved predictions.
videoagent/ Core MR. Video agent, prompts, model clients, and video reader.
assets/ Figures used in this README.
misc/API_KEYS.example.json
Setup
git clone https://github.com/ziqipang/MR-Video.git
cd MR-Video
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Create an API-key file:
cp misc/API_KEYS.example.json misc/API_KEYS.json
Then edit misc/API_KEYS.json:
{
"openai": "your-openai-api-key",
"gemini": "your-gemini-api-key"
}
You can also skip the JSON file and use environment variables:
export OPENAI_API_KEY=your-openai-api-key
export GEMINI_API_KEY=your-gemini-api-key
Data Layout
By default, the scripts look for data under datasets/. You can either place files there or pass custom paths through command-line arguments or environment variables.
Recommended local layout:
datasets/
lvbench/
video_info.meta.jsonl
videos/
<lvbench_video>.mp4
<lvbench_video>.json
captions/
<video_id>.json
egoschema/
egoschema_val_data.jsonl
videos/
<video_id>.mp4
captions/
<video_id>.json
videomme/
video_mme_long.jsonl
videos/
<video_id>.mp4
captions/
<video_id>.json
Caption files are available from Hugging Face:
For custom locations, either pass arguments:
python -m scripts.run_lvbench \
--lvbench_path /path/to/lvbench/videos \
--lvbench_meta /path/to/video_info.meta.jsonl \
--caption_dir /path/to/lvbench_captions
or set environment variables:
export LVBENCH_PATH=/path/to/lvbench/videos
export LVBENCH_META=/path/to/video_info.meta.jsonl
export LVBENCH_CAPTION_DIR=/path/to/lvbench_captions
Path overrides used by the runners:
| Benchmark | CLI arguments | Environment variables |
|---|---|---|
| LVBench | --lvbench_path, --lvbench_meta, --caption_dir | LVBENCH_PATH, LVBENCH_META, LVBENCH_CAPTION_DIR |
| EgoSchema | --egoschema_path, --egoschema_meta, --caption_dir | EGOSCHEMA_PATH, EGOSCHEMA_META, EGOSCHEMA_CAPTION_DIR |
| VideoMME | --videomme_path, --videomme_meta, --caption_dir | VIDEOMME_PATH, VIDEOMME_META, VIDEOMME_CAPTION_DIR |
Shared overrides include MRVIDEO_CONFIG, MRVIDEO_API_KEYS, MRVIDEO_OUTPUT_DIR, MRVIDEO_EXP_NAME, and MRVIDEO_LOG_DIR.
Smoke Test
Start with one video before launching a full benchmark run:
python -m scripts.run_lvbench --limit 1 --exp_name smoke_lvbench
If you already know a valid video ID, target it directly:
python -m scripts.run_lvbench --video_ids VIDEO_ID --exp_name smoke_lvbench
The same smoke-test flags are available for the other runners:
python -m scripts.run_egoschema --limit 1 --exp_name smoke_egoschema
python -m scripts.run_videomme --limit 1 --exp_name smoke_videomme
Run Benchmarks
LVBench:
python -m scripts.run_lvbench --exp_name mrvideo_lvbench
EgoSchema:
python -m scripts.run_egoschema --exp_name mrvideo_egoschema
VideoMME:
python -m scripts.run_videomme --exp_name mrvideo_videomme
Useful shared options:
--config configs/gpt4o_gemini.yaml
--api_key misc/API_KEYS.json
--output_dir data/<benchmark>
--num_processes 4 --process_index 0
Outputs are saved as reusable intermediate files:
data/<benchmark>/<exp_name>/<video_id>/
1_1_user_intention/
1_2_user_intention/
2_1_1_goal_proposal/
2_1_2_perception_results/
2_2_answer_generation/
Re-running the same command resumes from existing intermediate JSON files.
Evaluate
python -m eval.lvbench --exp_name mrvideo_lvbench
python -m eval.egoschema --exp_name mrvideo_egoschema
python -m eval.videomme --exp_name mrvideo_videomme
If your predictions are outside the default data/<benchmark> folder, pass --output_dir or set MRVIDEO_OUTPUT_DIR.
Citation
If you find this work useful in your research, please cite:
@article{pang2025mrvideo,
title={MR. Video: "MapReduce" is the Principle for Long Video Understanding},
author={Pang, Ziqi and Wang, Yu-Xiong},
journal={arXiv preprint arXiv:2504.16082},
year={2025}
}