README.md
June 5, 2026 · View on GitHub
Global-Local Tree Search in VLMs for 3D Indoor Scene Generation (CVPR 2025)
This paper considers 3D indoor scene generation as a planning problem subject to spatial and layout common sense constraints. To solve the problem with a VLM, we propose a new global-local tree search algorithm that decomposes the generation process into hierarchical planning stages, leveraging tree search to explore and optimize the layout at both global room-level and local object-level granularity.
Global-Local Monte Carlo Tree Search in Vision-Language Models for Text-to-3D Indoor Scene Generation (arXiv 2026)
This paper is an extension of our CVPR 2025 paper. We integrate PRM-guided MCTS for tree search and a new re-texture pipeline, and propose a pipeline, 3DTindo-Bench for text-to-3D scene generation.
🛠️ Setup
Requirements
- Python >= 3.12
- Blender 3.3+
- CUDA 11.8
- uv
Installation
This project uses uv for package management.
# Clone the repository
git clone https://github.com/dw-dengwei/TreeSearchGen.git
cd TreeSearchGen
# Create a virtual environment and install dependencies
uv sync
Database
Please follow holodeck to download 3D assets.
Environment Variables
Set these before running:
# Vision-Language Model API
export VL_MODEL_KEY='your_api_key'
export VL_MODEL_NAME='model_name'
export VL_MODEL_URL='api_endpoint'
# Language Model API
export LANGUAGE_MODEL_KEY='your_api_key'
export LANGUAGE_MODEL_NAME='model_name'
export LANGUAGE_MODEL_URL='api_endpoint'
# Optional: HuggingFace mirror (for users in China)
export HF_ENDPOINT="https://hf-mirror.com"
BLENDER_PATH="CUDA_VISIBLE_DEVICES=0 /path/to/blender"
🚀 Usage
Quick Start
bash run.sh
The script in run.sh runs the full pipeline:
- Solver:
mcts(tree search) - Instructions loaded from
all_instructions.txt - Output directory:
./output/ - Parallel execution configurable via
--max_parallel
Advanced Usage
python generate_layout.py \
--benchmark_instructions "all_instructions.txt" \
--output_root "output" \
--furniture_resolution 0.3 \
--object_resolution 0.1 \
--tree_search_config "config/config.yaml" \
--start_step 0 \
--end_step 17 \
--use_solver "mcts" \
--blender_path "blender" \
--max_parallel 4 \
--process_id "0" \
--prm_threshold 0.3
| Argument | Description |
|---|---|
--start_step / --end_step | Run a subset of the pipeline (e.g., --start_step 12 --end_step 13 for layout only) |
--use_solver | Search algorithm: mcts (default) or dfs |
--max_parallel | Maximum parallel instructions |
--process_id | Process range (e.g., "0-5", "0,2,4") for distributed execution |
--furniture_resolution | Grid resolution for furniture layout (meters) |
--object_resolution | Grid resolution for small object layout (meters) |
--prm_threshold | Threshold for the Process Reward Model |
Re-texturing with Paint3D
We apply texture refinement using Paint3D. Please download the checkpoints.
📊 Citation
If you find this work useful, please cite:
@inproceedings{deng2025global,
title={Global-Local Tree Search in VLMs for 3D Indoor Scene Generation},
author={Deng, Wei and Qi, Mengshi and Ma, Huadong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}
@article{qi2026global,
title={Global-Local Monte Carlo Tree Search in Vision-Language Models for Text-to-3D Indoor Scene Generation},
author={Qi, Mengshi and Deng, Wei and Zhang, Xianlin and Ma, Huadong},
journal={arXiv preprint arXiv:2606.06002},
year={2026}
}
🙏 Acknowledgement
We thank the authors of Objaverse for providing the 3D model database, Paint3D for the texture refinement pipeline, and Blender for the rendering engine.