README.md

January 27, 2026 ยท View on GitHub

Vid2World: Crafting Video Diffusion Models to Interactive World Models (ICLR 2026)

arXiv Paper License: MIT ย 

This is the official code base for the paper Vid2World: Crafting Video Diffusion Models to Interactive World Models.

Give it a star ๐ŸŒŸ if you find our work useful!

Banner for Vid2World

๐Ÿ”ฅ News & Updates

  • ๐Ÿšฉ 2026-01: Vid2World has been accepted by ICLR 2026, congrats!

  • ๐Ÿšฉ 2025-12: We release all model checkpoints on ๐Ÿค— Huggingface.

  • ๐Ÿšฉ 2025-12: We release code for training, inference and evaluation.

๐Ÿ“‹ TL;DR

We repurpose internet-scale pretrained video diffusion models into interactive world models:

  • โš™๏ธ Converts non-causal video diffusion backbones into autoregressive, temporally causal architectures with frame-level action conditioning.
  • ๐Ÿฆธ Enables high-fidelity, action-conditioned video simulation and scalable world model learning across robot manipulation, 3D game simulation, and open-world navigation.

๐Ÿš€ QuickStart

โš™๏ธ Environment Setup

Note

The code is tested on Ubuntu 20.04, 22.04 and AlmaLinux 9.5.

First create your conda environment:

conda create -n v2w python=3.8 -y
conda activate v2w

Then, install dependencies:

pip install -r requirements.txt

For training and evaluation:

  • Download the base video model (DynamiCrafter, 320 ร—\times 512), and save it into checkpoints/dynamicrafter_512_v1/model.ckpt.
  • Download the pretrained i3d model and save it into checkpoints/i3d/i3d_torchscript.pt.

At this point, your checkpoints folder should look like this:

checkpoints
โ”œโ”€โ”€ dynamicrafter_512_v1
โ”‚   โ””โ”€โ”€ model.ckpt
โ””โ”€โ”€ i3d
    โ””โ”€โ”€ i3d_torchscript.pt

๐Ÿค— Models

At the moment, we provide the following models:

FileDomainWeight Transfer MethodAction GuidanceTraining Steps
Vid2World-RT1RT-1Extrapolativeโœ”๏ธ100k
Vid2World-CSGOCSGOExtrapolativeโœ”๏ธ100k
Vid2World-RECONRECONExtrapolativeโœ”๏ธ100k
Vid2World-RT1-NAGRT-1ExtrapolativeโŒ30k
Vid2World-RT1-Masked-NAGRT-1MaskedโŒ30k
Vid2World-RT1-30kRT-1Extrapolativeโœ”๏ธ30k
Vid2World-RT1-MaskedRT-1Maskedโœ”๏ธ30k
Vid2World-RT1-ShiftRT-1Shiftโœ”๏ธ30k

Before inference, make sure you switch the |<your_pretrained_checkpoint>| in the config file to the path towards your local checkpoint.

๐Ÿ“ธ Showcases

๐Ÿค– Robot Manipulation ๐Ÿฆพ
๐ŸŽฎ Game Simulation ๐Ÿ•น๏ธ
๐Ÿ—บ๏ธ Open-World Navigation ๐Ÿงญ

For more showcases, check out our Project Page.

๐Ÿค– Vid2World for Robot Manipulation

1. Prepare Data & Model

Data

To download and preprocess the used dataset:

  • Download the RT-1 Robot Action Dataset from OXE.
  • Run the following command in the repo to save the processed dataset to your desired local folder.
python lvdm/data/oxe_data_converter.py --dataset_name fractal20220817_data --input_path {path to downloaded OXE} --output_path {path to stored npz}

Model

For inference, download our corresponding pretrained model from ๐Ÿค—Huggingface, check out QuickStart.

2. Training

To launch training with the RT-1 dataset, go to configs/manipulation/config_rt1_train.yaml and change the |<your_data_dir>| into the directory where your local data directory. To launch training on 1x4 GPU cards, use the following command:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/manipulation/config_rt1_train.yaml --train  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

For ablation experiments, we provide the corresponding configurations in configs/ablation.

FileWeight Transfer MethodAction GuidanceModel Checkpoint
config_rt1_*_masked_nag.yamlMaskedโŒ๐Ÿค—Vid2World-RT1-Masked-NAG
config_rt1_*_extrp_nag.yamlExtrapolativeโŒ๐Ÿค—Vid2World-RT1-NAG
config_rt1_*_shift.yamlShiftโœ”๏ธ๐Ÿค—Vid2World-RT1-Shift
config_rt1_*_masked.yamlMaskedโœ”๏ธ๐Ÿค—Vid2World-RT1-Masked
config_rt1_*_all.yamlExtrapolativeโœ”๏ธ๐Ÿค—Vid2World-RT1-30k

3. Inference

Here we provide two setups, one is generating the sequence frame by frame, which is referred to as Auto-Regressive Generation, and one that generates the full sequence all in one go, which we refer to as Non-Auto-Regressive Generation.

Before running the experiments, make sure you download/train the corresponding checkpoints, as well as change the data paths in the config file used.

Auto-Regressive Generation

For auto-regressive generation, run:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base code_release_configs/manipulation/config_rt1_test_ar.yaml --val  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

While doing ablation, switch the configuration file to the corresponding file.

Non-Auto-Regressive Generation

For non-auto-regressive generation, run:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base code_release_configs/manipulation/config_rt1_test_nar.yaml --val  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

RT-1 Action Control Test

Test model's ability to respond to different world_vector actions (X+, X-, Y+, Y-, Z+, Z-).

First, update the config file configs/manipulation/config_rt1_action_control_test.yaml:

  • Set pretrained_checkpoint to your checkpoint path
  • Set data_dir to your RT-1 data directory

Then run:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/manipulation/config_rt1_action_control_test.yaml --val --name rt1_action_control_test --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

Results will be saved to the directory specified in the config file's save_dir parameter. Each batch visualizes 8 action variants side-by-side for comparison.

๐Ÿ•น๏ธ Vid2World for Game Simulation

1. Prepare Data & Model

Data

To download and preprocess data, please follow the steps from DIAMOND, specifically:

  • Download the .tar files in the dataset_dm_scraped_dust2_tars from this dataset repo.
  • Use the provided script to process the dataset for full and low res. For our purpose, we use only the full_res folder.

Model

For inference, download our corresponding pretrained model from ๐Ÿค—Huggingface, check out QuickStart.

2. Training

To launch training with the csgo dataset, go to configs/game/config_csgo_train.yaml and change the |<your_data_dir>| into the directory where your local data directory. To launch training on 1x4 GPU cards, use the following command:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/game/config_csgo_train.yaml --train  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

3. Inference

Standard Inference

For inference, run:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/game/config_csgo_test.yaml --val  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

Long Rollout Inference on CSGO

For long rollout inference on CSGO, run:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/game/config_csgo_test_long_rollout.yaml --val  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

Long Rollout Inference on OOD Games

For long rollout inference on previously unseen games (Valorant, Delta Force), run:

Valorant:

python3 -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/game/config_csgo_test_long_rollout_valorant.yaml --val  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 2 lightning.trainer.num_nodes=1

Delta Force:

python3 -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --master_addr=127.0.0.1 --master_port=12879 --node_rank=0 ./main/trainer.py --base configs/game/config_csgo_test_long_rollout_delta_force.yaml --val  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 2 lightning.trainer.num_nodes=1

๐Ÿ—บ๏ธ Vid2World for Open-World Navigation

1. Prepare Data & Model

Data

To download and preprocess data, please follow the steps from NoMaD, specifically:

  • Download the RECON dataset.
  • Change the preprocessing resolution to (640,480).
  • Run process_recon.py to save the processed dataset to your desired local folder.

Model

For inference, download our corresponding pretrained model from ๐Ÿค—Huggingface, check out QuickStart.

2. Training

To launch training with the RECON dataset, go to configs/navigation/config_recon_train.yaml and change the |<your_data_dir>| into the directory where your local data directory. To launch training on 1x4 GPU cards, use the following command:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/navigation/config_recon_train.yaml --train --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

3. Inference

Following NWM, we evaluate our performance under two setups: single-step generation and auto-regressive generation. While in both setups, our model is doing auto-regressive generation, the data split is different, we support both setups.

Single-Step Generation

Change the |<data_dir>| and |<path_to_pretrained_checkpoint>| in configs/navigation/config_recon_test_single_step.yaml.

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/navigation/config_recon_test_single_step.yaml --val --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

Auto-Regressive Generation

Change the |<data_dir>| and |<path_to_pretrained_checkpoint>| in configs/navigation/config_recon_test_rollout.yaml.

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/navigation/config_recon_test_rollout.yaml --val --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

๐Ÿงช Evaluation

Note

Check out this issue if you encounter the following error message: ImportError: cannot import name 'trunc_normal_' from 'utils' (unknown location)

For evaluation, after running the inference code, calculate the metrics by running:

python eval.py --exp_folder |<your_log_image_dir>| --env  |<rt1/csgo/recon_time/recon_rollout>|

๐Ÿ“œ Citation

If you find our code useful, please consider citing our paper:

@article{huang2025vid2world0,
  title={Vid2World: Crafting Video Diffusion Models to Interactive World Models}, 
    author={Siqiao Huang and Jialong Wu and Qixing Zhou and Shangchen Miao and Mingsheng Long},
    year={2025},
  journal= {arXiv preprint arXiv:2505.14357}
}

๐Ÿ“ฌ Contact

If you have any questions, please contact huang-sq23@mails.tsinghua.edu.cn.

๐Ÿ’ก Acknowledgement

We sincerely appreciate the following github repos for their valuable codebase we build upon: