README.md
May 1, 2026 · View on GitHub
🧭 GoViG: Goal-Conditioned
Visual Navigation Instruction Generation
Fengyi Wu1,*,
Yifei Dong1,*,
Zhi-Qi Cheng1,†,
Yilong Dai1,
Guangyu Chen1,
Hang Wang2,
Qi Dai3,
Alexander G Hauptmann4
1UW, 2PolyU, 3Microsoft Research, 4CMU
GoViG introduces a new task in embodied AI: generating navigation instructions directly from egocentric visual observations of the initial and goal states. Unlike previous methods that rely on semantic maps or structured annotations, GoViG operates purely on egocentric visual input—making it highly adaptable to unseen and unstructured environments.
🔍 Overview
GoViG decomposes the instruction generation task into two interconnected subtasks:
-
Navigation Visualization
Predicts intermediate visual states that bridge the initial and goal views. -
Instruction Generation with Visual Cues
Synthesizes linguistically coherent and spatially grounded instructions based on both observed and anticipated visuals.
These components are unified within an autoregressive MLLM, trained with tailored objectives to ensure spatial accuracy and linguistic clarity.
🧠 Reasoning Strategies
Inspired by human navigation behavior, GoViG supports two multimodal reasoning paradigms:
- One-Pass Reasoning: Generates instructions in a single forward pass.
- Interleaved Reasoning: Alternates between visual prediction and language generation for incremental planning.
📦 Dataset: R2R-Goal
To evaluate GoViG, we introduce R2R-Goal, a dataset combining synthetic and real-world trajectories.
Quick Start
conda create -n GoViG python=3.10
conda activate GoViG
pip install torch==2.4.0
pip install -r requirements.txt --user
Data
We release a partial dataset for the purpose of debugging and demonstrating the data format, you can find them in data_samples. And you can access the full dataset here
unzip R2R_Goal.zip
Training
bash train.sh
Evaluation
bash eval.sh
you can find detailed metrics calculation in taskeval_vis.py.
Acknowledgement
We would like to thank ANOLE and MVOT for their publicly available codebase, which we referenced during the implementation of Anole training.
🧭 GoViG Gallery
| Initial View | Goal View | Trajectory (1P) | Instructions (1P) | Trajectory (Int) | Instructions (Int) |
|---|---|---|---|---|---|
![]() |
![]() |
![]() |
Stop in the doorway. | ![]() |
Stop in front of the last door on your right. Then take a slight left turn to go towards the bathroom. After you leave the kitchen and go through the double doors, keep going and go into the living room. Turn left at the first door past the oven and continue down the hallway. Go into the powder room that is straight ahead. Walk past the bathroom door. |
![]() |
![]() |
![]() |
Walk into the bedroom. | ![]() |
Walk out of the bedroom using the door on your right. Walk out of bedroom and turn right. Leave the bedroom. Turn to your right and go outside. Exit the room. Exit bedroom through doorway on the right. |
![]() |
![]() |
![]() |
Across the kitchen. | ![]() |
Exit the kitchen. Turn right at the counter. Walk past kitchen island. Turn past the sink, and in front of the oven to your left. Make a left immediately through the kitchenette, then turn right into the hallway. Walk past the sink. |
![]() |
![]() |
![]() |
Go through the door. | ![]() |
Straight through the bedroom with the lamp. Turn left and wait in the doorway. Stop in the bedroom doorway. Then turn right and wait in bedroom at the end of the hall. Stop in the doorway. Turn slight left, continue straight. Turn slight left, stop at bed. |
![]() |
![]() |
![]() |
Walk out of the kitchen. | ![]() |
Walk through the kitchen stop at the oven. Continue walking straight down the kitchen. Turn left, walk down the kitchen hallway. Turn left and enter kitchen. Walk and stop right before washing area. Turn right and continue down the hall until you get to a refrigerator. |
![]() |
![]() |
![]() |
Walk past the room on the left. | ![]() |
Walk past the door directly across from you. Continue straight and continue through a second set of double doors. Pass the wall on the right. Turn left and enter kitchen. Go down the hall into the office on the left. Walk to the end of the hall and through the open door. |
![]() |
![]() |
![]() |
Walk up stairs, turn right, continue up stairs | ![]() |
Walk up stairs. Go up the stairs. Walk straight ahead passed the stairs. Go up the stairs. Go up three steps then wait at the top. Go all of the way up the stairs. |
![]() |
![]() |
![]() |
Walk past the room on the left. | ![]() |
Stop in entryway of house. Stop at sliding barn door. Wait near the patio. Turn to the front row of couches is showing and walk over to the patio. Wait in the doorway to the patio. Walk straight besides the wooden tables. Stop when you reach the sliding glass doors. |
More examples of GoViG results on the Real-world Subset of our R2R-Goal dataset.
🌟 Citation
If you find this repository or our paper useful, please consider starring this repository and citing our paper:
@misc{wu2026goviggoalconditionedvisualnavigation,
title={GoViG: Goal-Conditioned Visual Navigation Instruction Generation via Multimodal Reasoning},
author={Fengyi Wu and Yifei Dong and Yilong Dai and Guangyu Chen and Qifeng Wu and Huiting Huang and Hang Wang and Qi Dai and Alexander G. Hauptmann and Zhi-Qi Cheng},
year={2026},
eprint={2508.09547},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.09547},
}































