AnchorSync: Global Consistency Optimization for Long Video Editing

October 12, 2025 · View on GitHub

Introduction

Official implementation of AnchorSync(ACMMM 2025). This repo includes the reproduced version of AnchorSync: Global Consistency Optimization for Long Video Editing.

Get Started

Suppose the AnchorSync codebase path is ${AnchorSync_HOME}. Then, follow the subsequent procedures.

Step 1: Prepare Environment

cd ${AnchorSync_HOME}
conda create -n anchorsync python=3.10
conda activate anchorsync
python3 -m pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118
python3 -m pip install -r requirements.txt --no-deps
python3 -m pip install xformers==0.0.25 --index-url https://download.pytorch.org/whl/cu118

Step 2: Prepare Pandas Dataset (For Training)

Download videos from Pandas-70m Dataset in ${AnchorSync_HOME}/data.

the ${AnchorSync_HOME}/data/Panda-70M folder should be organized as follows:

└── data
    └── Panda-70M
        ├── train
        ├── test
        ├── video_files.json

video_files.json record the video storage path information, which you can generate using the provided script:

python get_pandas.py

Step 3: Prepare Pretrained Models

Download stable-diffusion-v1-5, Canny controlnet for sd 1.5 and stable-video-diffusion-img2vid-xt. Change corresponding checkpoint path.

Or you can download them automatically at runtime (default).

Train

First, train joint diffusion for first step:

bash train_models/train_scripts/train_joint_frame_lora.sh

Second, train multimodal controlnet for SVD:

bash train_models/train_scripts/train_controlnet_canny+flow.sh

Usage

If you do not train, you can download joint frame lora in {joint_lora_path}, download multimodal controlnet in {multimodal_controlnet_path}.

For example, the ${AnchorSync_HOME}/data/Panda-70M folder should be organized as follows:

└── output_dir
    ├── joint_frame_lora
    ├── multimodal-controlnet

Put your videos in data/, named "{case_name}.mp4". You can use it like below:

  1. DDIM Inversion of first process (jointly edit anchor frames)
python run_models/run_inference_joint_frame_video_fusion_guidance_inversion.py --case_name "mountain-new" --invert_prompt "Vast Mountain Landscape under Clear Blue Sky" --joint_lora_dir "output_dir/joint_frame_lora"
  1. Forward editing of first process (jointly edit anchor frames)
python run_models/run_inference_joint_frame_video_fusion_guidance_forward.py --case_name "mountain-new" --invert_prompt "Vast Mountain Landscape under Clear Blue Sky" --prompt "Chinese Ink Wash Painting of Mountain Landscape under Clear Sky" --joint_lora_dir "output_dir/joint_frame_lora"
  1. Second process (Multimodal-Guided Interpolation)
python run_models/run_inference_trans_controlnet_canny_flow_video_fusion_guidance_pnp.py --case_name "mountain-new" --prompt "Chinese Ink Wash Painting of Mountain Landscape under Clear Sky" --multimodal_controlnet_path "output_dir/multimodal-controlnet"

We recommend you try editing longer videos like:

python run_models/run_inference_joint_frame_video_fusion_guidance_inversion.py --case_name "forest-2" --invert_prompt "A forest path in morning sunlight with green trees and long shadows" --joint_lora_dir "output_dir/joint_frame_lora"

python run_models/run_inference_joint_frame_video_fusion_guidance_forward.py --case_name "forest-2" --invert_prompt "A forest path in morning sunlight with green trees and long shadows" --prompt "A forest path covered in snow during a winter sunrise" --joint_lora_dir "output_dir/joint_frame_lora"

python run_models/run_inference_trans_controlnet_canny_flow_video_fusion_guidance_pnp.py --case_name "forest-2" --prompt "A forest path covered in snow during a winter sunrise" --multimodal_controlnet_path "output_dir/multimodal-controlnet"

Acknowledgement

This repository refers to multiple great open-sourced code bases. Thanks for their great contribution to the community.

Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.