VideoDirector: Precise Video Editing via Text-to-Video Models (CVPR2025)

November 25, 2025 · View on GitHub

Yukun Wang, Longguang Wang, Zhiyuan Ma, Qibin Hu, Kai Xu, Yulan Guo

arXiv Pytorch

[Project Page]

Edited results

          Input Video                                                            Edited Results

Abstract

VideoDirector harness the powerful temporal generation capability of the text-to-video (T2V) model for precise video editing. VideoDirector produces results with high quality in terms of accuracy, fidelity, motion smoothness, and realism. For more see the project webpage.

Setup repository and conda environment

git clone https://github.com/Yukun66/Video_Director.git 
cd Video_Director

conda env create -f environment.yaml
conda activate videodirector

💡 Pretrained Model Preparations

Download Stable Diffusion V1.5

Download Stable Diffusion, weights path is:

models/StableDiffusion/stable-diffusion-v1-5

Prepare Community Models

Manually download the community .safetensors models from RealisticVision. Community checkpoints path is:

models/DreamBooth_LoRA/realisticVisionV60B1_v51VAE.safetensors

Prepare AnimateDiff Motion Modules

Manually download the AnimateDiff modules from AnimateDiff. Save the modules to:

models/Motion_Module

📌 Preprocess

Mask prediction

We utilize the SAM2 model (https://github.com/facebookresearch/sam2) to generate masks for our method.

Run the bash file: SAM2_model/checkpoints/download_ckpts.sh to download the SAM2 weights:

cd SAM2_model/checkpoints
bash download_ckpts.sh
cd ../..

The SAM2 model is located in the SAM2_model directory and requires installation before use:

cd SAM2_model
pip install -e ".[demo]"
cd ..

We provide a using example to get mask of resources/bear.mp4 in: SAM2_model/notebooks/video_predictor_example.ipynb.

🚗 Editing video

Run our method:

bash run_editing.sh

Config details

Our editing config file is in editing_config_yaml/bear_editing_config.yaml. The config parameters are detailed below.

Prompts
  • inversion_prompt: original video description prompt. Example:
 "A brown bear, walking on rocky terrain, next to a stone wall."
  • new_prompt: target video description prompt. Example:
"A tiger, walking on rocky terrain, next to a stone wall."
  • p2p_eq_params_words: the new inserted words in new prompt. Example:
- tiger
STDG_guide
  • Coefficient of STDG guidance. Example:
-STDG_guide:
 0.5
 0.5
 0.0
 0.5
p2p_self_replace_steps
  • τs\tau_s in paper Sec 3.3. Example:
p2p_self_replace_steps: 0.4
p2p_cross_replace_steps
  • τc\tau_c in paper Sec 3.3. Example:
p2p_cross_replace_steps: 0.8

📝 Citation

If you find this work useful, please consider citing:

@inproceedings{wang2025videodirector,
  title={Videodirector: Precise video editing via text-to-video models},
  author={Wang, Yukun and Wang, Longguang and Ma, Zhiyuan and Hu, Qibin and Xu, Kai and Guo, Yulan},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  year={2025}
}