RealCam-I2V
October 19, 2025 ยท View on GitHub
[ICCV'25] Official repo of "RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control".
๐ News
- 25/07/05: Release inference code and checkpoints of RealCam-I2V on CogVideoX 1.5 for exploration. The results we report in the paper are based on DynamiCrafter, for full reproduction and evaluation, please refer to our previous repo CamI2V.
- 25/06/26: RealCam-I2V is accepted by ICCV 2025! ๐๐
- 25/05/18: Release training code of RealCam-I2V on CogVideoX 1.5.
- 25/03/26: Release our dataset RealCam-Vid v1 for metric-scale camera-controlled video generation!
- 25/02/18: Initial commit of the project, we plan to release our DiT-based real-camera I2V models (e.g., CogVideoX) in this repo.
โ๏ธ Environment
Quick Start
apt install libgl1-mesa-glx libgl1-mesa-dri xvfb # for ubuntu
yum install -y mesa-libGL mesa-dri-drivers Xvfb # for centos
conda create -n realcami2v python=3.12
conda activate realcami2v
conda install ffmpeg=7 -c conda-forge
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu126
๐ซ Inference
Download Pretrained Models
Download and put under pretrained folder the pretrained weights of CogVideoX1.5-5B-I2V, Metric3D and Qwen2.5-VL.
Download Model Checkpoints
Download our weights of RealCam-I2V and put under checkpoints folder.
Please edit demo/models.json if you have a custom model path.
Run Gradio Demo
python gradio_app.py
๐ Training
Prepare Dataset
Please access RealCam-Vid and download our dataset for training RealCam-I2V-CogVideoX-1.5. Please unzip all contents in data folder.
Launch
Edit example training script accelerate_train.sh if necessary and launch training by:
bash accelerate_train.sh
For CogVideoX 1.5, we precompute latents before training.
๐ค Related Repo
- Our dataset, the first open-sourced, combining diverse scene dynamics with metric-scale camera trajectories, is available at RealCam-Vid.
- Our previous work at CamI2V.
- We have borrowed a lot of code from the original CogVideoX repository.
๐๏ธ Citation
@article{li2025realcam,
title={RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control},
author={Li, Teng and Zheng, Guangcong and Jiang, Rui and Zhan, Shuigen and Wu, Tao and Lu, Yehao and Lin, Yining and Li, Xi},
journal={arXiv preprint arXiv:2502.10059},
year={2025},
}
@article{zheng2025realcam,
title={RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements},
author={Zheng, Guangcong and Li, Teng and Zhou, Xianpan and Li, Xi},
journal={arXiv preprint arXiv:2504.08212},
year={2025},
}
@article{zheng2024cami2v,
title={CamI2V: Camera-Controlled Image-to-Video Diffusion Model},
author={Zheng, Guangcong and Li, Teng and Jiang, Rui and Lu, Yehao and Wu, Tao and Li, Xi},
journal={arXiv preprint arXiv:2410.15957},
year={2024}
}