Awesome Spatial Reasoning with MVLMs

January 25, 2026 · View on GitHub

Awesome arXiv License: MIT Made With Love

This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).

Feel free to open a Pull Request to add new work!


📑 Table of Contents


Introduction

In this survey, we provide a comprehensive review of existing tasks in multimodal spatial reasoning with large models, categorizing and highlighting the frontiers of multimodal large language models (MLLMs), and introducing open benchmarks for evaluating these models. We start by reviewing the general spatial reasoning area with focuses on post-training techniques, explainability, and architecture. Beyond classical 2D scenarios, we systemically review the spatial relationship reasoning, scene and layout reasoning, and also visual question answering, grounding in the 3D space.

Further, we also discuss the recent advances in embodied AI tasks, such as vision-language navigation and action models. Additionally, audio and ego-centric video modalities are also considered as part of this survey for distinct and emerging spatial understanding with novel sensors. We believe this survey establishes a solid foundation and offers valuable insights into the critical field of multimodal spatial reasoning.

Existing reasoning surveys are in Reasoning_survey.md.


Papers

3D Vision

🔗 3D_Vision.md

Embodied AI

🔗 Embodied_AI.md

General MLLM

🔗 General_MLLM.md

Video / Audio / Egocentric

🔗 Video_Audio_Egocentric.md

Spatial Benchmark

🔗 Spatial_Benchmark.md


Resources

Workshops and Tutorials

TBD


Contributing

Contributions are welcome! To contribute:

  1. Fork this repository
  2. Add your paper/resource in the appropriate markdown file or create a new one
  3. Update the link list in README.md if needed
  4. Submit a Pull Request 🎉

Citation

If you find this project helpful, please cite:

@article{zheng2025multimodal,
  title={Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks},
  author={Zheng, Xu and Dongfang, Zihao and Jiang, Lutao and Zheng, Boyuan and Guo, Yulong and Zhang, Zhenquan and Albanese, Giuliano and Yang, Runyi and Ma, Mengjiao and Zhang, Zixin and others},
  journal={https://arxiv.org/abs/2510.25760},
  year={2025}
}

Star History

Star History Chart


License

This project is licensed under the MIT License — see the LICENSE file for details.