README.md
March 13, 2026 · View on GitHub
An Efficient and Multi-Modal Navigation System with One-Step World Model

Training and Inference
1. Data Preparation
1.1 Preparing Datasets
For acquiring and processing public datasets, please follow NoMaD. For Habitat datasets, install Habitat with conda install habitat-sim==0.2.4 withbullet headless -c conda-forge -c aihabitat, then use collect_datasets.py to collect trajectories in the format required by this project. We use the Matterport3D (MP3D) dataset for data collection.
1.2 Dataset Structure
Your dataset must follow the directory structure below. If you are collecting a custom dataset, please organize it accordingly:
├── <dataset_name>
│ ├── <name_of_traj1>
│ │ ├── 0.jpg
│ │ ├── 1.jpg
│ │ ├── ...
│ │ ├── T_1.jpg
│ │ └── traj_data.pkl
│ ├── <name_of_traj2>
│ │ ├── 0.jpg
│ │ ├── ...
│ │ └── traj_data.pkl
│ ...
└── └── <name_of_trajN>
├── 0.jpg
├── ...
└── traj_data.pkl
1.3 Data Splitting
Use data_split.py to split your data into training and testing sets.
Split Training Data:
python data_split.py -i <path_to_train_data> -d <train_dataset_name>
Split Test Data:
python data_split.py -i <path_to_test_data> -d <test_dataset_name> -s 0
2. Training
Train the model using the provided configuration file:
python train.py --config config/config_shortcut_w_pretrain.yaml
3. Generation Performance Evaluation
Evaluation involves three steps: preparing ground truth frames, generating future frames, and calculating metrics (LPIPS, DreamSim, FID).
Step 1: Prepare Ground Truth Frames
python isolated_infer.py --exp logs/<run_name> --ckp latest --datasets <test_dataset_name> --gt 1
Step 2: Generate Future Frames
python isolated_infer.py --exp logs/<run_name> --ckp latest --datasets <test_dataset_name> --gt 0
Step 3: Calculate Metrics
python isolated_eval.py --gt_dir output/gt --exp_dir output/<run_name>_latest --datasets <test_dataset_name>
4. Inference
Perform waypoints prediction using the trained World Model.
4.1 Prerequisites
Download the Distance Model Weights before running inference into models_dist/weights.
4.2 Run Inference
We provide an intuitive inference script for testing:
python inference.py
Citation
If you find this work useful in your research, please consider citing:
@misc{shen2026efficientmultimodalnavigationonestep,
title={An Efficient and Multi-Modal Navigation System with One-Step World Model},
author={Wangtian Shen and Ziyang Meng and Jinming Ma and Mingliang Zhou and Diyun Xiang},
year={2026},
eprint={2601.12277},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2601.12277},
}