README.md
January 27, 2026 ยท View on GitHub
Vid2World: Crafting Video Diffusion Models to Interactive World Models (ICLR 2026)
This is the official code base for the paper Vid2World: Crafting Video Diffusion Models to Interactive World Models.
Give it a star ๐ if you find our work useful!

๐ฅ News & Updates
-
๐ฉ 2026-01: Vid2World has been accepted by ICLR 2026, congrats!
-
๐ฉ 2025-12: We release all model checkpoints on ๐ค Huggingface.
-
๐ฉ 2025-12: We release code for training, inference and evaluation.
๐ TL;DR
We repurpose internet-scale pretrained video diffusion models into interactive world models:
- โ๏ธ Converts non-causal video diffusion backbones into autoregressive, temporally causal architectures with frame-level action conditioning.
- ๐ฆธ Enables high-fidelity, action-conditioned video simulation and scalable world model learning across robot manipulation, 3D game simulation, and open-world navigation.
๐ QuickStart
โ๏ธ Environment Setup
Note
The code is tested on Ubuntu 20.04, 22.04 and AlmaLinux 9.5.
First create your conda environment:
conda create -n v2w python=3.8 -y
conda activate v2w
Then, install dependencies:
pip install -r requirements.txt
For training and evaluation:
- Download the base video model (DynamiCrafter, 320 512), and save it into
checkpoints/dynamicrafter_512_v1/model.ckpt. - Download the pretrained i3d model and save it into
checkpoints/i3d/i3d_torchscript.pt.
At this point, your checkpoints folder should look like this:
checkpoints
โโโ dynamicrafter_512_v1
โ โโโ model.ckpt
โโโ i3d
โโโ i3d_torchscript.pt
๐ค Models
At the moment, we provide the following models:
| File | Domain | Weight Transfer Method | Action Guidance | Training Steps |
|---|---|---|---|---|
| Vid2World-RT1 | RT-1 | Extrapolative | โ๏ธ | 100k |
| Vid2World-CSGO | CSGO | Extrapolative | โ๏ธ | 100k |
| Vid2World-RECON | RECON | Extrapolative | โ๏ธ | 100k |
| Vid2World-RT1-NAG | RT-1 | Extrapolative | โ | 30k |
| Vid2World-RT1-Masked-NAG | RT-1 | Masked | โ | 30k |
| Vid2World-RT1-30k | RT-1 | Extrapolative | โ๏ธ | 30k |
| Vid2World-RT1-Masked | RT-1 | Masked | โ๏ธ | 30k |
| Vid2World-RT1-Shift | RT-1 | Shift | โ๏ธ | 30k |
Before inference, make sure you switch the |<your_pretrained_checkpoint>| in the config file to the path towards your local checkpoint.
๐ธ Showcases
| ๐ค Robot Manipulation ๐ฆพ |
| ๐ฎ Game Simulation ๐น๏ธ |
| ๐บ๏ธ Open-World Navigation ๐งญ |
For more showcases, check out our Project Page.
๐ค Vid2World for Robot Manipulation
1. Prepare Data & Model
Data
To download and preprocess the used dataset:
- Download the RT-1 Robot Action Dataset from OXE.
- Run the following command in the repo to save the processed dataset to your desired local folder.
python lvdm/data/oxe_data_converter.py --dataset_name fractal20220817_data --input_path {path to downloaded OXE} --output_path {path to stored npz}
Model
For inference, download our corresponding pretrained model from ๐คHuggingface, check out QuickStart.
2. Training
To launch training with the RT-1 dataset, go to configs/manipulation/config_rt1_train.yaml and change the |<your_data_dir>| into the directory where your local data directory. To launch training on 1x4 GPU cards, use the following command:
python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/manipulation/config_rt1_train.yaml --train --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1
For ablation experiments, we provide the corresponding configurations in configs/ablation.
| File | Weight Transfer Method | Action Guidance | Model Checkpoint |
|---|---|---|---|
config_rt1_*_masked_nag.yaml | Masked | โ | ๐คVid2World-RT1-Masked-NAG |
config_rt1_*_extrp_nag.yaml | Extrapolative | โ | ๐คVid2World-RT1-NAG |
config_rt1_*_shift.yaml | Shift | โ๏ธ | ๐คVid2World-RT1-Shift |
config_rt1_*_masked.yaml | Masked | โ๏ธ | ๐คVid2World-RT1-Masked |
config_rt1_*_all.yaml | Extrapolative | โ๏ธ | ๐คVid2World-RT1-30k |
3. Inference
Here we provide two setups, one is generating the sequence frame by frame, which is referred to as Auto-Regressive Generation, and one that generates the full sequence all in one go, which we refer to as Non-Auto-Regressive Generation.
Before running the experiments, make sure you download/train the corresponding checkpoints, as well as change the data paths in the config file used.
Auto-Regressive Generation
For auto-regressive generation, run:
python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base code_release_configs/manipulation/config_rt1_test_ar.yaml --val --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1
While doing ablation, switch the configuration file to the corresponding file.
Non-Auto-Regressive Generation
For non-auto-regressive generation, run:
python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base code_release_configs/manipulation/config_rt1_test_nar.yaml --val --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1
RT-1 Action Control Test
Test model's ability to respond to different world_vector actions (X+, X-, Y+, Y-, Z+, Z-).
First, update the config file configs/manipulation/config_rt1_action_control_test.yaml:
- Set
pretrained_checkpointto your checkpoint path - Set
data_dirto your RT-1 data directory
Then run:
python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/manipulation/config_rt1_action_control_test.yaml --val --name rt1_action_control_test --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1
Results will be saved to the directory specified in the config file's save_dir parameter. Each batch visualizes 8 action variants side-by-side for comparison.
๐น๏ธ Vid2World for Game Simulation
1. Prepare Data & Model
Data
To download and preprocess data, please follow the steps from DIAMOND, specifically:
- Download the
.tarfiles in thedataset_dm_scraped_dust2_tarsfrom this dataset repo. - Use the provided script to process the dataset for full and low res. For our purpose, we use only the
full_resfolder.
Model
For inference, download our corresponding pretrained model from ๐คHuggingface, check out QuickStart.
2. Training
To launch training with the csgo dataset, go to configs/game/config_csgo_train.yaml and change the |<your_data_dir>| into the directory where your local data directory. To launch training on 1x4 GPU cards, use the following command:
python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/game/config_csgo_train.yaml --train --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1
3. Inference
Standard Inference
For inference, run:
python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/game/config_csgo_test.yaml --val --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1
Long Rollout Inference on CSGO
For long rollout inference on CSGO, run:
python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/game/config_csgo_test_long_rollout.yaml --val --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1
Long Rollout Inference on OOD Games
For long rollout inference on previously unseen games (Valorant, Delta Force), run:
Valorant:
python3 -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/game/config_csgo_test_long_rollout_valorant.yaml --val --name training_512_v1.0 --logdir |<your_log_dir>| --devices 2 lightning.trainer.num_nodes=1
Delta Force:
python3 -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --master_addr=127.0.0.1 --master_port=12879 --node_rank=0 ./main/trainer.py --base configs/game/config_csgo_test_long_rollout_delta_force.yaml --val --name training_512_v1.0 --logdir |<your_log_dir>| --devices 2 lightning.trainer.num_nodes=1
๐บ๏ธ Vid2World for Open-World Navigation
1. Prepare Data & Model
Data
To download and preprocess data, please follow the steps from NoMaD, specifically:
- Download the RECON dataset.
- Change the preprocessing resolution to (640,480).
- Run
process_recon.pyto save the processed dataset to your desired local folder.
Model
For inference, download our corresponding pretrained model from ๐คHuggingface, check out QuickStart.
2. Training
To launch training with the RECON dataset, go to configs/navigation/config_recon_train.yaml and change the |<your_data_dir>| into the directory where your local data directory. To launch training on 1x4 GPU cards, use the following command:
python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/navigation/config_recon_train.yaml --train --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1
3. Inference
Following NWM, we evaluate our performance under two setups: single-step generation and auto-regressive generation. While in both setups, our model is doing auto-regressive generation, the data split is different, we support both setups.
Single-Step Generation
Change the |<data_dir>| and |<path_to_pretrained_checkpoint>| in configs/navigation/config_recon_test_single_step.yaml.
python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/navigation/config_recon_test_single_step.yaml --val --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1
Auto-Regressive Generation
Change the |<data_dir>| and |<path_to_pretrained_checkpoint>| in configs/navigation/config_recon_test_rollout.yaml.
python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/navigation/config_recon_test_rollout.yaml --val --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1
๐งช Evaluation
Note
Check out this issue if you encounter the following error message:
ImportError: cannot import name 'trunc_normal_' from 'utils' (unknown location)
For evaluation, after running the inference code, calculate the metrics by running:
python eval.py --exp_folder |<your_log_image_dir>| --env |<rt1/csgo/recon_time/recon_rollout>|
๐ Citation
If you find our code useful, please consider citing our paper:
@article{huang2025vid2world0,
title={Vid2World: Crafting Video Diffusion Models to Interactive World Models},
author={Siqiao Huang and Jialong Wu and Qixing Zhou and Shangchen Miao and Mingsheng Long},
year={2025},
journal= {arXiv preprint arXiv:2505.14357}
}
๐ฌ Contact
If you have any questions, please contact huang-sq23@mails.tsinghua.edu.cn.
๐ก Acknowledgement
We sincerely appreciate the following github repos for their valuable codebase we build upon: