VideoDirector: Precise Video Editing via Text-to-Video Models (CVPR2025)
November 25, 2025 · View on GitHub
Yukun Wang, Longguang Wang, Zhiyuan Ma, Qibin Hu, Kai Xu, Yulan Guo
[Project Page]
Edited results
Input Video Edited Results
Abstract
VideoDirector harness the powerful temporal generation capability of the text-to-video (T2V) model for precise video editing. VideoDirector produces results with high quality in terms of accuracy, fidelity, motion smoothness, and realism. For more see the project webpage.
🔧 Installations (python==3.11.3 recommended)
Setup repository and conda environment
git clone https://github.com/Yukun66/Video_Director.git
cd Video_Director
conda env create -f environment.yaml
conda activate videodirector
💡 Pretrained Model Preparations
Download Stable Diffusion V1.5
Download Stable Diffusion, weights path is:
models/StableDiffusion/stable-diffusion-v1-5
Prepare Community Models
Manually download the community .safetensors models from RealisticVision.
Community checkpoints path is:
models/DreamBooth_LoRA/realisticVisionV60B1_v51VAE.safetensors
Prepare AnimateDiff Motion Modules
Manually download the AnimateDiff modules from AnimateDiff. Save the modules to:
models/Motion_Module
📌 Preprocess
Mask prediction
We utilize the SAM2 model (https://github.com/facebookresearch/sam2) to generate masks for our method.
Run the bash file: SAM2_model/checkpoints/download_ckpts.sh to download the SAM2 weights:
cd SAM2_model/checkpoints
bash download_ckpts.sh
cd ../..
The SAM2 model is located in the SAM2_model directory and requires installation before use:
cd SAM2_model
pip install -e ".[demo]"
cd ..
We provide a using example to get mask of resources/bear.mp4 in: SAM2_model/notebooks/video_predictor_example.ipynb.
🚗 Editing video
Run our method:
bash run_editing.sh
Config details
Our editing config file is in editing_config_yaml/bear_editing_config.yaml.
The config parameters are detailed below.
Prompts
- inversion_prompt: original video description prompt. Example:
"A brown bear, walking on rocky terrain, next to a stone wall."
- new_prompt: target video description prompt. Example:
"A tiger, walking on rocky terrain, next to a stone wall."
- p2p_eq_params_words: the new inserted words in new prompt. Example:
- tiger
STDG_guide
- Coefficient of STDG guidance. Example:
-STDG_guide:
0.5
0.5
0.0
0.5
p2p_self_replace_steps
- in paper Sec 3.3. Example:
p2p_self_replace_steps: 0.4
p2p_cross_replace_steps
- in paper Sec 3.3. Example:
p2p_cross_replace_steps: 0.8
📝 Citation
If you find this work useful, please consider citing:
@inproceedings{wang2025videodirector,
title={Videodirector: Precise video editing via text-to-video models},
author={Wang, Yukun and Wang, Longguang and Ma, Zhiyuan and Hu, Qibin and Xu, Kai and Guo, Yulan},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
year={2025}
}