README.md

June 5, 2026 · View on GitHub

Global-Local Tree Search in VLMs for 3D Indoor Scene Generation (CVPR 2025)

Wei DengMengshi Qi*  Huadong Ma 

CVPR 2025 arXiv 2025 Apache 2.0

This paper considers 3D indoor scene generation as a planning problem subject to spatial and layout common sense constraints. To solve the problem with a VLM, we propose a new global-local tree search algorithm that decomposes the generation process into hierarchical planning stages, leveraging tree search to explore and optimize the layout at both global room-level and local object-level granularity.

Global-Local Monte Carlo Tree Search in Vision-Language Models for Text-to-3D Indoor Scene Generation (arXiv 2026)

Mengshi Qi*Wei Deng  Xianlin Zhang  Huadong Ma 

arXiv 2026 Apache 2.0

This paper is an extension of our CVPR 2025 paper. We integrate PRM-guided MCTS for tree search and a new re-texture pipeline, and propose a pipeline, 3DTindo-Bench for text-to-3D scene generation.

🛠️ Setup

Requirements

  • Python >= 3.12
  • Blender 3.3+
  • CUDA 11.8
  • uv

Installation

This project uses uv for package management.

# Clone the repository
git clone https://github.com/dw-dengwei/TreeSearchGen.git
cd TreeSearchGen

# Create a virtual environment and install dependencies
uv sync

Database

Please follow holodeck to download 3D assets.

Environment Variables

Set these before running:

# Vision-Language Model API
export VL_MODEL_KEY='your_api_key'
export VL_MODEL_NAME='model_name'
export VL_MODEL_URL='api_endpoint'

# Language Model API
export LANGUAGE_MODEL_KEY='your_api_key'
export LANGUAGE_MODEL_NAME='model_name'
export LANGUAGE_MODEL_URL='api_endpoint'

# Optional: HuggingFace mirror (for users in China)
export HF_ENDPOINT="https://hf-mirror.com"


BLENDER_PATH="CUDA_VISIBLE_DEVICES=0 /path/to/blender"

🚀 Usage

Quick Start

bash run.sh

The script in run.sh runs the full pipeline:

  • Solver: mcts (tree search)
  • Instructions loaded from all_instructions.txt
  • Output directory: ./output/
  • Parallel execution configurable via --max_parallel

Advanced Usage

python generate_layout.py \
    --benchmark_instructions "all_instructions.txt" \
    --output_root "output" \
    --furniture_resolution 0.3 \
    --object_resolution 0.1 \
    --tree_search_config "config/config.yaml" \
    --start_step 0 \
    --end_step 17 \
    --use_solver "mcts" \
    --blender_path "blender" \
    --max_parallel 4 \
    --process_id "0" \
    --prm_threshold 0.3
ArgumentDescription
--start_step / --end_stepRun a subset of the pipeline (e.g., --start_step 12 --end_step 13 for layout only)
--use_solverSearch algorithm: mcts (default) or dfs
--max_parallelMaximum parallel instructions
--process_idProcess range (e.g., "0-5", "0,2,4") for distributed execution
--furniture_resolutionGrid resolution for furniture layout (meters)
--object_resolutionGrid resolution for small object layout (meters)
--prm_thresholdThreshold for the Process Reward Model

Re-texturing with Paint3D

We apply texture refinement using Paint3D. Please download the checkpoints.

📊 Citation

If you find this work useful, please cite:

@inproceedings{deng2025global,
  title={Global-Local Tree Search in VLMs for 3D Indoor Scene Generation},
  author={Deng, Wei and Qi, Mengshi and Ma, Huadong},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}

@article{qi2026global,
  title={Global-Local Monte Carlo Tree Search in Vision-Language Models for Text-to-3D Indoor Scene Generation},
  author={Qi, Mengshi and Deng, Wei and Zhang, Xianlin and Ma, Huadong},
  journal={arXiv preprint arXiv:2606.06002},
  year={2026}
}

🙏 Acknowledgement

We thank the authors of Objaverse for providing the 3D model database, Paint3D for the texture refinement pipeline, and Blender for the rendering engine.