README.md
March 22, 2026 ยท View on GitHub
SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes
ICLR 2026
ย ย
SceneCOT Framework
๐ฅ News
- [2026-3] Evaluation code, model checkpoints, detailed installation instruction have been released
- [2026-3] We release training code
- [2026-1] SceneCOT is accepted by ICLR 2026
- [2025-6] We released the webpage of SceneCOT
๐ Get Started
- Clone the repository.
git clone https://github.com/SceneCOT/scenecot
cd scenecot
- Create a Python environment and install dependencies.
conda create -n scenecot python=3.9
conda activate scenecot
# PyTorch (example tested version)
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=11.8 -c pytorch -c nvidia
# project dependencies
pip install -r requirements.txt
- Install point-cloud third-party modules.
pip install spconv-cu118
cd model/pointnetpp
python setup.py install
cd ../..
# sanity check
python -c 'from model.pointnetpp.pointnetpp import PointNetPP'
If PointNext build/import fails, either disable PointNext usage or place the compiled file from LEO_data under model/pointnext/cpp/pointnet2_batch/.
๐ง Reproducibility configuration
The configs were updated to avoid machine-specific absolute paths. We recommend setting the following environment variables:
| Variable | Purpose | Default | Download / Source Link |
|---|---|---|---|
SCENECOT_EXP_ROOT | experiment output root (cfg.base_dir) | ./outputs | - |
SCENECOT_DATA_ROOT | root directory for dataset/assets used by configs/data/default.yaml | ./data_assets | SceneCOT dataset |
SCENECOT_COT_DATA_ROOT | root directory for released COT annotations (MSQA/, GQA3D/) | ${SCENECOT_DATA_ROOT}/scenecot_cot_data | SceneCOT dataset / scenecot_cot_data |
SCENECOT_MSR3D_ANNO_DIR | MSQA annotation directory (contains situated_qa_{train,val,test}_pure_txt.json) | ${SCENECOT_COT_DATA_ROOT}/MSQA | MSQA |
SCENECOT_GQA3D_ANNO_DIR | GQA3D annotation directory (contains gqa3d_{train,val,test}.json) | ${SCENECOT_COT_DATA_ROOT}/GQA3D | GQA3D |
HF_HOME | Hugging Face cache root (cfg.hf_home) | ./.cache/huggingface | Hugging Face Hub |
SCENECOT_MODEL_ROOT | unified root directory for default model/checkpoint paths | ./model_assets | SceneCOT models |
SCENECOT_LLM_PATH | LLaVA model path (override) | ${SCENECOT_MODEL_ROOT}/llava-v1.5-7b | LLaVA-1.5-7B |
SCENECOT_VISION_TOWER_PATH | CLIP vision tower path (override) | ${SCENECOT_MODEL_ROOT}/clip-vit-large-patch14-336 | CLIP ViT-L/14-336 |
SCENECOT_PQ3D_TOKENIZER_PATH | PQ3D text tokenizer path (override, data.pq3d_tokenizer_path) | ${SCENECOT_MODEL_ROOT}/clip-vit-large-patch14 | SceneCOT models |
SCENECOT_POINTNET_TOKENIZER_PATH | PQ3D PointNet++ tokenizer checkpoint (override) | ${SCENECOT_MODEL_ROOT}/pointnet_tokenizer.pth | SceneCOT models |
SCENECOT_QUERY3D_PRETRAIN_PATH | PQ3D/SceneVerse pretrain checkpoint (override) | ${SCENECOT_MODEL_ROOT}/query3d_pretrain.bin | SceneCOT models |
SCENECOT_EXPERT1_PATH | MOE expert-1 checkpoint directory (override) | ${SCENECOT_MODEL_ROOT}/expert1_checkpoint0 | SceneCOT model repo (checkpoint dirs) |
SCENECOT_EXPERT2_PATH | MOE expert-2 checkpoint directory (override) | ${SCENECOT_MODEL_ROOT}/expert2_best.pth | SceneCOT model repo (checkpoint dirs) |
Example:
export SCENECOT_EXP_ROOT=/path/to/experiments
export SCENECOT_DATA_ROOT=/path/to/data_assets
export SCENECOT_COT_DATA_ROOT=/path/to/data_assets/scenecot_cot_data
export SCENECOT_MSR3D_ANNO_DIR=/path/to/data_assets/scenecot_cot_data/MSQA
export SCENECOT_GQA3D_ANNO_DIR=/path/to/data_assets/scenecot_cot_data/GQA3D
export HF_HOME=/path/to/hf_cache
export SCENECOT_MODEL_ROOT=/path/to/model_assets
# Optional explicit overrides when using non-default file names/locations
# export SCENECOT_LLM_PATH=/path/to/model_assets/llava-v1.5-7b
# export SCENECOT_VISION_TOWER_PATH=/path/to/model_assets/clip-vit-large-patch14-336
# export SCENECOT_PQ3D_TOKENIZER_PATH=/path/to/model_assets/clip-vit-large-patch14
# export SCENECOT_EXPERT1_PATH=/path/to/model_assets/expert1_checkpoint0
# export SCENECOT_EXPERT2_PATH=/path/to/model_assets/expert2_best.pth
๐ฆ Pretrained weights
To reproduce paper-level performance, the following checkpoints are needed:
- SceneCOT experts (released): SceneCOT model repo
- PQ3D PointNet++ tokenizer (
pointnet_tokenizer.pth) โ setSCENECOT_POINTNET_TOKENIZER_PATH - Query3D/SceneVerse pretrain (
pytorch_model.bin) โ setSCENECOT_QUERY3D_PRETRAIN_PATH
For MOE evaluation, expert checkpoints are expected as directories under SCENECOT_MODEL_ROOT:
${SCENECOT_MODEL_ROOT}/
โโโ expert1_checkpoint0/
โ โโโ pytorch_model.bin (or model.safetensors)
โโโ expert2_best.pth/
โโโ pytorch_model.bin (or model.safetensors)
These map to:
moe.expert1_pathโ${SCENECOT_MODEL_ROOT}/expert1_checkpoint0(orSCENECOT_EXPERT1_PATH)moe.expert2_pathโ${SCENECOT_MODEL_ROOT}/expert2_best.pth(orSCENECOT_EXPERT2_PATH)
By default, 2/3 are resolved under SCENECOT_MODEL_ROOT. If files are absent, related modules are initialized without those pretrained weights, which may significantly affect final metrics.
๐ External services
Weights & Biases
Tracking is enabled by default. For evaluation-only/offline runs without login:
export WANDB_MODE=disabled
Hugging Face access
If direct access to huggingface.co is restricted, set a mirror endpoint and keep a local cache:
export HF_ENDPOINT=https://your-hf-mirror
export HF_HOME=/path/to/hf_cache
๐ Data preparation
- Download released dataset assets from SceneCOT dataset.
- Place all downloaded data under one root directory, for example:
/path/to/data_assets
- Set:
export SCENECOT_DATA_ROOT=/path/to/data_assets
configs/data/default.yamlresolves paths fromSCENECOT_DATA_ROOTas:
${SCENECOT_DATA_ROOT}/SceneVerseโdata.sceneverse_base${SCENECOT_DATA_ROOT}/leo2-cotโdata.cot_annotation_base${SCENECOT_DATA_ROOT}/scan_familyโdata.scan_family_base${SCENECOT_DATA_ROOT}/LEO-2_feature/ScanNetโdata.obj_feat_2d_base.ScanNet${SCENECOT_DATA_ROOT}/scene-verse-pred-all/ScanNetโdata.obj_feat_base.ScanNet${SCENECOT_DATA_ROOT}/scenecot_imgs/imgs/scannetโdata.obj_img_base.ScanNet
- COT annotation paths are resolved clearly as:
${SCENECOT_COT_DATA_ROOT}/MSQA(orSCENECOT_MSR3D_ANNO_DIR) โdata.msr3d_anno_dir,data.cotqa.msr3d.anno_dir${SCENECOT_COT_DATA_ROOT}/GQA3D(orSCENECOT_GQA3D_ANNO_DIR) โdata.gqa3d_anno_dir,data.cotqa.gqa3d.anno_dir
Expected folder layout:
${SCENECOT_COT_DATA_ROOT}/
โโโ MSQA/
โ โโโ situated_qa_train_pure_txt.json
โ โโโ situated_qa_val_pure_txt.json
โ โโโ situated_qa_test_pure_txt.json
โโโ GQA3D/
โโโ gqa3d_train.json
โโโ gqa3d_val.json
โโโ gqa3d_test.json
- Download released checkpoints from SceneCOT models, and set optional PQ3D checkpoint envs if available.
๐น Training and evaluation
Training:
sh scripts/train/full_training_msqa_gqa3d.sh
Evaluation (MOE test script):
sh scripts/test/full_training_msqa_beacon3d_test_moe.sh
๐ Offline evaluation
- Download
evaluation_assetsfrom HF evaluation assets. - Set optional variables:
export SCENECOT_EVAL_ASSETS=/path/to/evaluation_assets
export SCENECOT_EVAL_ROOT=/path/to/experiments
- Run:
python evaluator/msqa_evaluator_offline.py
Expected prediction files are read from:
{result_dir}/{model_name}/eval_results/{dataset_name}/results.json (or results.pt)
where result_dir defaults to SCENECOT_EVAL_ROOT.
๐ TODO List
- Arxiv paper
- Evaluation code
- Training code
- Model weights
- SceneCOT-185K dataset
BibTex
If you find our work helpful, please consider citing us:
@inproceedings{linghu2026scenecot,
title={SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes},
author={Linghu, Xiongkun and Huang, Jiangyong and Zhu, Ziyu and Jia, Baoxiong and Huang, Siyuan},
booktitle={Proceedings of the International Conference on Learning Representations (ICLR)},
year={2026}
}