Predictive but Not Plannable: RC-aux for Latent World Models

May 15, 2026 · View on GitHub

This repository contains the code for Reachability-Correction auxiliary objective (RC-aux). RC-aux is a lightweight training and planning correction for reconstruction-free latent world models: it keeps the LeWM backbone unchanged, trains open-loop multi-horizon prediction, adds a budget-conditioned reachability head, and optionally uses that reachability signal during planning.

Paper: arXiv:2605.07278

What Is Included

.
├── train.py / eval.py        # five-task pixel-control training and MPC evaluation
├── jepa.py                   # LeWM/RC-aux latent world model
├── module.py                 # predictor, regularizer, reachability head
├── rcaux.py                  # multi-horizon and reachability objectives
├── config/
│   ├── train/                # LeWM and RC-aux configs
│   └── eval/                 # TwoRoom, Reacher, Push-T, Cube configs
├── scripts/
│   └── eval_dino_family_official.py
├── libero/                   # LIBERO-Goal OFT/BCRNN action-head scripts
├── tools/                    # fixed-group generation and success summaries
└── results/                  # result CSV summaries

Checkpoints and datasets are not stored in the GitHub repository. We release the main public checkpoint separately on Hugging Face.

Installation

Python 3.10 is recommended.

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt

For headless MuJoCo evaluation:

export MUJOCO_GL=egl

Data and Checkpoints

The five pixel-control tasks use stable-worldmodel's cache root:

export STABLEWM_HOME=/path/to/stable-wm-cache

Expected dataset locations:

TaskDataset path under $STABLEWM_HOME
TwoRoomtworoom.h5
Reacherdmc/reacher_random.h5
Push-Tpusht_expert_train.h5
Cubeogbench/cube_single_expert.h5
Walldino_wall.h5

Checkpoint paths are resolved relative to $STABLEWM_HOME. Pass policy stems without the _object.ckpt suffix.

Five-Task Evaluation

Example with the released TwoRoom RC-aux checkpoint:

mkdir -p "$STABLEWM_HOME/tworoom_rcaux"
cp /path/to/hf_release/rcaux/checkpoints/pixel_control/tworoom_rcaux/* "$STABLEWM_HOME/tworoom_rcaux/"

python eval.py --config-name=tworoom.yaml \
  cache_dir="$STABLEWM_HOME" \
  policy=tworoom_rcaux/rcaux_tworoom \
  +planner_override.use_reachability_cost=true \
  +planner_override.reachability_cost_weight=0.85 \
  output.filename=tworoom_rcaux_eval.txt

Use fixed evaluation groups by passing:

eval.row_indices_file=/path/to/group_00.json

Summarize logs:

python tools/summarize_group_success.py /path/to/results/*.txt

Wall uses the DINO-WM environment-native benchmark path. Provide the DINO-WM source checkout and Wall dataset explicitly:

python scripts/eval_dino_family_official.py \
  --source-root /path/to/dino_wm \
  --benchmark wall \
  --policy-kind object \
  --policy /path/to/lewm_epoch_8_object.ckpt \
  --dataset-path "$STABLEWM_HOME/dino_wall.h5" \
  --val-dataset-path "$STABLEWM_HOME/dino_wall.h5" \
  --num-samples 600 \
  --n-steps 20 \
  --topk 60 \
  --horizon 8 \
  --receding-horizon 4 \
  --action-block 5 \
  --goal-cost-reduce softmin \
  --goal-cost-softmin-temperature 1.0 \
  --use-reachability-cost on \
  --reachability-cost-weight 0.85 \
  --output wall_rcaux_eval.txt

Training

LeWM baseline:

python train.py --config-name=lewm.yaml \
  data=tworoom \
  wandb.enabled=false \
  output_model_name=lewm_tworoom \
  subdir=tworoom/lewm_tworoom

RC-aux:

python train.py --config-name=rcaux_default.yaml \
  data=tworoom \
  wandb.enabled=false \
  output_model_name=rcaux_tworoom \
  subdir=tworoom/rcaux_tworoom

Continuation from an existing LeWM checkpoint:

init.weights_path=/path/to/source_object.ckpt init.strict=false

LIBERO-Goal

The LIBERO-Goal extension code is under libero/. It trains an OFT-style action chunk head on top of the LeWM-family representation and evaluates with the official LIBERO success checker. LIBERO-Goal does not use the MPC planner or planner_override.* settings; there is no reachability_cost_weight argument in these scripts.

Expected local layout:

assets/benchmarks/LIBERO/
assets/datasets/libero_goal_agentview.h5
checkpoints/libero/lewm_epoch_40_object.ckpt

Train the OFT-style action head:

python libero/train_libero_goal_lewm_oft_head.py \
  --libero-root assets/benchmarks/LIBERO \
  --init-policy checkpoints/libero/lewm_epoch_40_object.ckpt \
  --tasks all \
  --image-keys agentview_rgb,eye_in_hand_rgb \
  --chunk-len 8 \
  --action-horizon 8 \
  --hidden-dim 1024 \
  --batch-size 32 \
  --max-epochs 30 \
  --train-encoder \
  --run-dir runs/libero_goal_rcaux_oft

Evaluate:

TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1 \
python libero/eval_libero_goal_lewm_oft_head.py \
  --checkpoint runs/libero_goal_rcaux_oft/lewm_libero_oft_head_epoch_30.ckpt \
  --tasks all \
  --n-eval 50 \
  --max-steps 600 \
  --output results/libero_goal_rcaux_oft_n50.json

On systems where robosuite cannot create an EGL offscreen context, run the same command with MUJOCO_GL=glx.

Main Reported Results

The CSV summaries in results/ mirror the reported tables. Local LeWM-family rows are mean±std over five fixed evaluation groups of 50 episodes.

TaskLeWMLeWM-contRC-auxMatched delta
TwoRoom88.8±3.088.8±3.098.0±1.4+9.2
Reacher81.2±7.982.8±7.287.2±6.4+4.4
Push-T90.4±3.091.2±3.990.8±3.3-0.4
Wall50.4±6.5--83.6±3.6+33.2
Cube72.4±5.972.8±5.276.0±7.5+3.2

For TwoRoom, Reacher, Push-T, and Cube, matched deltas compare against LeWM-cont. For Wall, no continuation control is available, so the matched delta compares against local LeWM.

Citation

@article{li2026predictive,
  title={Predictive but Not Plannable: RC-aux for Latent World Models},
  author={Li, Wenyuan and Li, Guang and Maeda, Keisuke and Ogawa, Takahiro and Haseyama, Miki},
  journal={arXiv preprint arXiv:2605.07278},
  year={2026}
}

This project builds on LeWorldModel, stable-worldmodel, stable-pretraining, DINO-WM, and LIBERO. Please cite the corresponding original work when using those components.