README.md
November 17, 2025 · View on GitHub
Jay Zhangjie Wu* Xuanchi Ren* Tianchang Shen Tianshi Cao Kai He
Yifan Lu Ruiyuan Gao Enze Xie Shiyi Lan Jose M. Alvarez
Jun Gao Sanja Fidler
Zian Wang Huan Ling*†
* equal contribution † corresponding author
📖 Project Page | 🤗 ChronoEdit-14B | 📑 Arxiv
TL;DR: ChronoEdit reframes image editing as a video generation task, using input and edited images as start/end frames to leverage pretrained video models with temporal consistency. A temporal reasoning stage introduces reasoning tokens to ensure physically plausible edits and visualize the editing trajectory.
🫨 News
2025/11/16: 👋 ChronoEdit-14B-Diffusers-Paint-Brush-Lora is released on 🤗 HuggingFace. Thanks to @AK for hosting the 🤗 demo.2025/11/10: 👋 ChronoEdit-14B-Diffusers-Upscaler-Lora is released on 🤗 HuggingFace. Thanks to @AK for hosting the 🤗 demo.2025/11/10: 👋 ChronoEdit is officially merged into diffusers Pipeline.2025/10/29: 👋 ChronoEdit-14B is released on 🤗 HuggingFace!2025/10/04: 👋 ChronoEdit paper is released.
🤗 Open Source Plan
- ChronoEdit
- Inference with Diffuser
- LoRA Training with DiffSynth-Studio
- ChronoEdit-14B Checkpoints
- ChronoEdit-14B 8-Steps Distilled LoRA Checkpoints
- ChronoEdit-2B Checkpoints
- ChronoEdit-2B 4-Steps Distilled LoRA Checkpoints
- Full Model Training Infrastructure
📑 Quick Start
Installation
Clone the repo:
git clone https://github.com/nv-tlabs/ChronoEdit
cd ChronoEdit
This repo runs only on Linux systems and requires python 3.10:
conda env create -f environment.yml -n chronoedit_mini
conda activate chronoedit_mini
pip install torch==2.7.1 torchvision==0.22.1
pip install -r requirements_minimal.txt
Optional: Install flash attention with cudatoolkit if you want faster interence. Model can run without installation.
# You may need to set a limit to the number of threads used during compilation to prevent OOM Errors.
export MAX_JOBS=16
pip install flash-attn==2.6.3
Download diffusers checkpoint from HuggingFace:
hf download nvidia/ChronoEdit-14B-Diffusers --local-dir checkpoints/ChronoEdit-14B-Diffusers
Diffusers Inference 🤗
Note
2025/11/10 Update: ChronoEdit is officially merged into diffuser, checkout official pipeline at LINK
(1) Single GPU Inference
Run inference with default hyperparameters.
PYTHONPATH=$(pwd) python scripts/run_inference_diffusers.py \
--input assets/images/input_2.png --offload_model --use-prompt-enhancer \
--prompt "Add a sunglasses to the cat's face" \
--output output.mp4 \
--model-path ./checkpoints/ChronoEdit-14B-Diffusers
Append tag --enable-temporal-reasoning to enable temporal reasoning for better consistency.
Note
The inference requires ~ 34G GPU memory with --offload_model flag turned on.
In temporal reasoning mode, GPU memory requirement is increased to ~ 38G.
(2) Inference with Prompt Enhancer
Append tag --use-prompt-enhancer to turn on auto prompt enhancer.
You can adjust the --prompt_enhancer_model flag to select a different model. As default, we recommend using Qwen/Qwen3-VL-30B-A3B-Instruct, which delivers the best results but requires up to 60 GB of peak memory. Smaller vision-language models are supported as well, though they may produce lower-quality outputs.
Note
We strongly suggest the users to read Prompt Guidance and use our provided prompt enhancer for best results.
Note
If you prefer not to host the prompt enhancer locally, you can use the provided System prompt with any modern online LLM chat agent.
(3) Inference with 8-Step Distillation LoRA
With distillation LoRA, we recommend to set hyperparameter as --flow-shift 2.0, --guidance-scale 1.0 and --num-inference-steps 8
# Advanced usage with lora settings
PYTHONPATH=$(pwd) python scripts/run_inference_diffusers.py --use-prompt-enhancer --offload_model \
--input assets/images/input_2.png \
--prompt "Add a sunglasses to the cat's face" \
--output output_lora.mp4 \
--num-inference-steps 8 \
--guidance-scale 1.0 \
--flow-shift 2.0 \
--lora-scale 1.0 \
--seed 42 \
--lora-path ./checkpoints/ChronoEdit-14B-Diffusers/lora/chronoedit_distill_lora.safetensors \
--model-path ./checkpoints/ChronoEdit-14B-Diffusers
(4) Inference with other LoRAs
ChronoEdit-14B-Diffusers-Upscaler-Lora 🤗
Trigger Prompt:
The user want to enhance image clarity and resolution while keeping the content identical. super-resolution, high detail, 4K clarity, same composition, natural texture.
hf download nvidia/ChronoEdit-14B-Diffusers-Upscaler-Lora --local-dir checkpoints/ChronoEdit-14B-Diffusers-Upscaler-Lora
The model is tested until 2k resolution.
PYTHONPATH=$(pwd) python scripts/run_inference_diffusers.py \
--input assets/images/lr.png --width 1584 --height 1056 \
--prompt "The user want to enhance image clarity and resolution while keeping the content identical. super-resolution, high detail, 4K clarity, same composition, natural texture." \
--output output_sr_lora.mp4 \
--lora-scale 1.0 \
--seed 42 \
--lora-path ./checkpoints/ChronoEdit-14B-Diffusers-Upscaler-Lora/upsample_lora_diffusers.safetensors \
--model-path ./checkpoints/ChronoEdit-14B-Diffusers
ChronoEdit-14B-Diffusers-Paint-Brush-Lora 🤗
Trigger Prompt:
Turn the pencil sketch in the image into an actual object that is consistent with the image’s content. The user wants to change the sketch to a {}
{} should be filled with simple description of what you are drawing. i.e. a crown and hat that matches the original image’s style.
Note
The LoRA was trained with black paintbrush. Other colors's sketch could also work but works worse than black
hf download nvidia/ChronoEdit-14B-Diffusers-Paint-Brush-Lora --local-dir checkpoints/ChronoEdit-14B-Diffusers-Paint-Brush-Lora
Note
We recommand to use paintbrush LoRA together with 8 steps distill LoRA It works better than without 8 steps distill LoRA in our testing cases.
PYTHONPATH=$(pwd) python scripts/run_inference_diffusers.py \
--input assets/images/input_paintbrush.png \
--prompt "Turn the pencil sketch in the image into an actual object that is consistent with the image’s content. The user wants to change the sketch to a crown and a hat." \
--output output_paintbrush_lora.png \
--num-inference-steps 8 \
--guidance-scale 1.0 \
--flow-shift 2.0 \
--lora-scale 1.0 \
--seed 42 \
--lora-path ./checkpoints/ChronoEdit-14B-Diffusers/lora/chronoedit_distill_lora.safetensors ./checkpoints/ChronoEdit-14B-Diffusers-Paint-Brush-Lora/paintbrush_lora_diffusers.safetensors \
--model-path ./checkpoints/ChronoEdit-14B-Diffusers
Gradio Demo with interactive brush:
PYTHONPATH=$(pwd) python scripts/gradio_paintbrush.py
(5) Inference with multiple LoRAs
For example, to use both distill LoRA and paintbrush LoRA:
PYTHONPATH=$(pwd) python scripts/run_inference_diffusers.py \
--input assets/images/input_paintbrush.png \
--prompt "Turn the pencil sketch in the image into an actual object that is consistent with the image’s content. The user wants to change the sketch to a crown and a hat." \
--output output_paintbrush_lora.png \
--num-inference-steps 8 \
--guidance-scale 1.0 \
--flow-shift 2.0 \
--lora-scale 1.0 \
--seed 42 \
--lora-path ./checkpoints/ChronoEdit-14B-Diffusers/lora/chronoedit_distill_lora.safetensors ./checkpoints/ChronoEdit-14B-Diffusers-Paint-Brush-Lora/paintbrush_lora_diffusers.safetensors \
--model-path ./checkpoints/ChronoEdit-14B-Diffusers
📑 LoRA Finetune with Diffsynth-Studio
Install Diffsynth-Studio:
pip install git+https://github.com/modelscope/DiffSynth-Studio.git
Training LoRA with Diffsynth. See Dataset Doc for dataset prepartion guidance:
PYTHONPATH=$(pwd) accelerate launch scripts/train_diffsynth.py \
--dataset_base_path data/example_dataset \
--dataset_metadata_path data/example_dataset/metadata.csv \
--height 1024 \
--width 1024 \
--num_frames 5 \
--dataset_repeat 1 \
--model_paths '[["checkpoints/ChronoEdit-14B-Diffusers/transformer/diffusion_pytorch_model-00001-of-00014.safetensors","checkpoints/ChronoEdit-14B-Diffusers/transformer/diffusion_pytorch_model-00002-of-00014.safetensors","checkpoints/ChronoEdit-14B-Diffusers/transformer/diffusion_pytorch_model-00003-of-00014.safetensors","checkpoints/ChronoEdit-14B-Diffusers/transformer/diffusion_pytorch_model-00004-of-00014.safetensors","checkpoints/ChronoEdit-14B-Diffusers/transformer/diffusion_pytorch_model-00005-of-00014.safetensors","checkpoints/ChronoEdit-14B-Diffusers/transformer/diffusion_pytorch_model-00006-of-00014.safetensors","checkpoints/ChronoEdit-14B-Diffusers/transformer/diffusion_pytorch_model-00007-of-00014.safetensors","checkpoints/ChronoEdit-14B-Diffusers/transformer/diffusion_pytorch_model-00008-of-00014.safetensors","checkpoints/ChronoEdit-14B-Diffusers/transformer/diffusion_pytorch_model-00009-of-00014.safetensors","checkpoints/ChronoEdit-14B-Diffusers/transformer/diffusion_pytorch_model-00010-of-00014.safetensors","checkpoints/ChronoEdit-14B-Diffusers/transformer/diffusion_pytorch_model-00011-of-00014.safetensors","checkpoints/ChronoEdit-14B-Diffusers/transformer/diffusion_pytorch_model-00012-of-00014.safetensors","checkpoints/ChronoEdit-14B-Diffusers/transformer/diffusion_pytorch_model-00013-of-00014.safetensors","checkpoints/ChronoEdit-14B-Diffusers/transformer/diffusion_pytorch_model-00014-of-00014.safetensors"]]' \
--model_id_with_origin_paths "Wan-AI/Wan2.1-I2V-14B-720P:models_t5_umt5-xxl-enc-bf16.pth,Wan-AI/Wan2.1-I2V-14B-720P:Wan2.1_VAE.pth,Wan-AI/Wan2.1-I2V-14B-720P:models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth" \
--learning_rate 1e-4 \
--num_epochs 5 \
--remove_prefix_in_ckpt "pipe.dit." \
--output_path "./models/train/ChronoEdit-14B_lora" \
--lora_base_model "dit" \
--lora_target_modules "q,k,v,o,ffn.0,ffn.2" \
--lora_rank 32 \
--extra_inputs "input_image" \
--use_gradient_checkpointing_offload
Inference with Diffsynth:
PYTHONPATH=$(pwd) python scripts/run_inference_diffsynth.py
Inference with Diffsynth (Multi-GPUs):
PYTHONPATH=$(pwd) torchrun --standalone --nproc_per_node=8 scripts/run_inference_diffsynth.py
📑 Full Model Training Framework
We release ChronoEdit’s full training infrastructure and codebase, enabling distributed inference and large-scale fine-tuning of pretrained video diffusion models. See Training Doc for details.
📑 Create Your Own Training Dataset
We provide an automated editing labeling script to generate high-quality editing instructions from pairs of images (before and after editing). The script uses state-of-the-art vision-language models to analyze image pairs and generate precise editing prompts with Chain-of-Thought (CoT) reasoning. See dataset guidance doc for details.
Acknowledgments
The authors would like to thank Product Managers Aditya Mahajan and Matt Cragun for their valuable guidance and support. We further acknowledge the Cosmos Team at NVIDIA, especially Qinsheng Zhang and Hanzi Mao, for their consultation on Cosmos-Pred2.5-2B. We also thank Yuyang Zhao, Junsong Chen, and Jincheng Yu for their insightful discussions. Finally, we are grateful to Ben Cashman, Yuting Yang, and Amanda Moran for their infrastructure support.
Also shout-out to Wiedemer et al., Video Models are Zero-Shot Learners and Reasoners (2025) — while the two projects were developed concurrently, several of our examples were inspired by this excellent work.
Citation
@article{wu2025chronoedit,
title={ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation},
author={Wu, Jay Zhangjie and Ren, Xuanchi and Shen, Tianchang and Cao, Tianshi and He, Kai and Lu, Yifan and Gao, Ruiyuan and Xie, Enze and Lan, Shiyi and Alvarez, Jose M. and Gao, Jun and Fidler, Sanja and Wang, Zian and Ling, Huan},
journal={arXiv preprint arXiv:2510.04290},
year={2025}
}