Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets

August 29, 2025 · View on GitHub

[Website] [Paper] [Talk]

Chuning Zhu¹, Raymond Yu¹, Siyuan Feng², Benjamin Burchfiel², Paarth Shah², Abhishek Gupta¹

¹University of Washington ²Toyota Research Institute

This repository provides a PyTorch implementation of Unified World Model (UWM). UWM combines action diffusion and video diffusion to enable scalable pretraining on large, heterogeneous robotics datasets.

Code structure

configs: Configuration files for pretraining and finetuning experiments.
datasets: Dataset wrappers for DROID, Robomimic, and LIBERO. We standardize all datasets using compressed Zarr buffers.
environments: Interface wrappers for Robomimic and LIBERO environments.
experiments: Training and evaluation scripts.
models: Model definitions for UWM and baselines.
scripts: Bash scripts for running DROID experiments.

Setup

Install the package via

pip install -e .

Note: if you encounter issues using tensorflow-dataset with DROID, consider installing tensorflow-dataset from source.

Robomimic Experiments

To run a Robomimic single-task experiment,

Install the Robomimic dataset.
Update hdf5_path and buffer_path in the config (e.g., configs/dataset/robomimic_cap_ph.yaml).
Run:

python experiments/uwm/train_robomimic.py --config_name train_uwm_robomimic.yaml dataset=robomimic_can_ph exp_id=singletask

This command will generate a Zarr compressed buffer at the buffer_path specified in the config file.

LIBERO Experiments

The LIBERO experiments share most infrastructure with the Robomimic experiments.

Pretraining

To pretrain a UWM on LIBERO-90,

Install the LIBERO dataset.
Update hdf5_path and buffer_path in configs/dataset/libero_90.yaml.
Run:

python experiments/uwm/train_robomimic.py --config_name train_uwm_robomimic.yaml dataset=libero_90 exp_id=pretrain

Finetuning

To finetune a pretrained UWM on a downstream LIBERO task (e.g., Book-Caddy),

Update hdf5_path and buffer_path in configs/dataset/libero_book_caddy.yaml.
Run:

python experiments/uwm/train_robomimic.py --config-name finetune_uwm_robomimic.yaml dataset=libero_book_caddy exp_id=finetune pretrain_checkpoint_path="logdir/uwm/libero_90/pretrain/0/models.pt"

We release the pretrained LIBERO-90 checkpoint here. You can download and directly finetune from this checkpoint.

DROID Experiments

We provide shell scripts for DROID pretraining / cotraining / finetuning experiments in the scripts directory. Each script runs a dataset conversion pipeline to create a Zarr buffer for the corresponding DROID TFDS dataset and then launches training.

Pretraining

To launch a DROID pretraining experiment,

Install the DROID dataset
Update DATA_DIR and BUFFER_PATH in scripts/launch_droid_pretrain.sh
Run:

source scripts/launch_droid_pretrain.sh

Cotraining

To launch a video cotraining experiment,

Install the DROID dataset
Update DATA_DIR, ROBOT_BUFFER_PATH, and VIDEO_BUFFER_PATH in scripts/launch_droid_cotrain.sh
Run:

source scripts/launch_droid_cotrain.sh

Finetuning

To fineune a pretrained model to a downstream task,

Collect demonstrations using the DROID interface
Convert them into a TFDS dataset (via this pipeline)
Modify and run:

source scripts/launch_droid_finetune.sh

We release the pretrained and cotrained DROID UWM checkpoints here. You can download and directly finetune from these checkpoints.

Bibtex

If you find this code useful, please cite:

@inproceedings{zhu2025uwm,
    author    = {Zhu, Chuning and Yu, Raymond and Feng, Siyuan and Burchfiel, Benjamin and Shah, Paarth and Gupta, Abhishek},
    title     = {Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets},
    booktitle = {Proceedings of Robotics: Science and Systems (RSS)},
    year      = {2025},
}