README.md
May 9, 2025 ยท View on GitHub
Hypo3D: Exploring Hypothetical Reasoning in 3D
Ye Mao,ย
Weixun Luo,ย
Junpeng Jing,ย
Anlan Qiu,ย
Krystian Mikolajczykย
ย Imperial College London
๐ฃ Latest Updates
- [2025-05-01] ๐ Hypo3D has been accepted to ICML 2025!
- [2025-02-04] ๐ Hypo3D paper preprint is now available on arXiv.
- [2025-02-09] ๐ Hypo3D benchmark has been released.
- [2025-02-09] ๐งช Evaluation scripts for multiple vision-language models are now publicly available.
๐ Key Takeaways
-
Hypo3D introduces a novel 3D reasoning benchmark.
๐ง Task Definition: Given a past 3D scene (e.g., point cloud, top-view image, scene captions) and a context change description, the goal is to imagine the updated scene after the change and answer questions based on that hypothetical scene state. -
The benchmark includes 7,727 context changes and 14,885 QA pairs spanning 700 indoor scenes.
These changes are categorized into five types:- Movement โ Geometric transformations (e.g., translation, rotation)
- Removal โ Objects taken away from the scene
- Attribute โ Changes in object properties (e.g., color, open/closed state)
- Addition โ New objects introduced into the scene
- Replacement โ Existing objects substituted with different ones

About this code
The Hypo3D codebase is written in Python and provides simple modules for benchmarking 10 Foundation models, including LLM, 2D VLMs, and 3D VLMs. The core module structure is as follows:
Hypo3D/
โโโ LLM/ # Storing scripts for LLM models that use scene captions as input for 3D scene processing.
โ โโโ GPT4o-text. # Folder for evaluating GPT4o in text-only mode.
โ โโโ llama/ # Folder for evaluating LLama3.2 3B.
โโโ 2D-VLM/ # Storing scripts for 2D-VLM models that use top-view maps as input for 3D scene processing.
โ โโโ Claude/ # Folder for evaluating Claude 3.5 Sonnet.
โ โโโ GPT4o/ # Folder for evaluating GPT4o in vison-language mode.
โ โโโ Qwen2-VL/ # Folder for evaluating Qwen2-VL 7B and 72B.
โ โโโ llava-ov/ # Folder for evaluating LLaVA-OV 7B and 72B.
โโโ 3D-VLM/ # Storing scripts for 2D-VLM models that use point cloud/multi-view images as input for 3D scene processing.
โ โโโ LLaVA-3D/ # Folder for evaluating LLaVA-3D model 7B.
โ โโโ LEO/ (coming soon) # Folder for evaluating LEO model 7B.
โโโ exp/ # Experiemental results for various models.
โโโ metric_compute.py # Compute exact match/partial match for each context change category.
โโโ ...
Download the Hypo3D Benchmark
- Clone the repository recursively.
git clone --recursive https://github.com/MatchLab-Imperial/Hypo3D.git - Download 3D scene representations in Hypo3D dataset
Expected data folder format:git clone https://huggingface.co/datasets/MatchLab/Hypo3D mv Hypo3D dataset # rename dataset folder cd datasetdataset/ โโโ LLM_data/ # Scene captions for Large Language Models (e.g., LLama3.2) โโโ 2D_VLM_data/ # Scene Top-View Maps for 2D Vision-Language Models (e.g., GPT4o) โ โโโ top_view_no_label_rotated/ # Non-semantic top-view map. โ โโโ top_view_with_label_rotated/ # Semantic top-view map. โโโ 3D_VLM_data/ # 3D Scene Data for 3D Vision-Language Models (e.g., LLaVA-3D) - Complete the form to download Hypo3D dataset
๐ Hypo3D: EM (Exact Match) / PM (Partial Match) Accuracy of Foundation Models
| Model Family | Model | EM (%) | PM (%) |
|---|---|---|---|
| LLM (Scene Caption) | Llama-3.2 3B | 26.08 | 29.91 |
| GPT-4o API (Text) | 35.54 | 39.65 | |
| 2D VLM (Non-Semantic Map) | Qwen2-VL 7B | 29.68 | 34.47 |
| Qwen2-VL 72B | 33.39 | 37.51 | |
| LLaVA-OV 7B | 30.62 | 34.34 | |
| LLaVA-OV 72B | 36.38 | 40.13 | |
| Claude 3.5 Sonnet API | 20.70 | 30.12 | |
| GPT-4o API | 33.58 | 36.75 | |
| 2D VLM (Semantic Map) | Qwen2-VL 7B | 34.40 | 38.91 |
| Qwen2-VL 72B | 42.45 | 48.25 | |
| LLaVA-OV 7B | 38.93 | 43.51 | |
| LLaVA-OV 72B | 43.81 | 46.83 | |
| Claude 3.5 Sonnet API | 41.36 | 51.59 | |
| GPT-4o API | 45.50 | 48.82 | |
| 3D VLM (RGB-D Video/Point Cloud) | LEO 7B | 14.83 | 22.40 |
| LLaVA-3D 7B | 31.56 | 35.23 | |
| Human | 91.00 | 92.50 |
Contact
- Ye Mao: ye.mao21@imperial.ac.uk
Please open an issue or submit a pull request for issues, or contributions.
๐ผ License
Citation
If you find our benchmark is helpful, please cite our paper:
@article{mao2025hypo3d,
title={Hypo3D: Exploring Hypothetical Reasoning in 3D},
author={Mao, Ye and Luo, Weixun and Jing, Junpeng and Qiu, Anlan and Mikolajczyk, Krystian},
journal={arXiv preprint arXiv:2502.00954},
year={2025}
}