Training and Inference

December 30, 2024 · View on GitHub

Environment Setup

To install the environment requirements needed for SceneVerse, you can run the installation scripts provided by:

$ conda env create -n sceneverse python=3.9
$ conda activate sceneverse
$ pip install --r requirements.txt

Meanwhile, SceneVerse depends on an efficient implementation of PointNet2 which is located in modules. Remember to install it with

$ cd modules/third_party/pointnet2
$ python setup.py install
$ cd ../..

We provide all experiment configurations in configs/final, you can find the experiment setting in the top of comment each experiment file. To correctly use the configuration files, you need to change the following fields in the configuration file to load paths correctly:

base_dir: save path for model checkpoints, configurations, and logs.
logger.entity: we used W&B for logging experiments, change it to your corresponding account.
data.{DATASET}_familiy_base: path to {Dataset} related data.
model.vision.args.path: path to the pre-trained object encoder (PointNet++).
model.vision.args.lang_path: deprecated, but basically text embeddings of the 607 classes in ScanNet.

You can walk through the configs/final/all_pretrain.yaml and compare it with other files to see how we controlled data and objectives used in training.

Experiments

1. Training and Inference

This codebase leverages Huggingface Accelerate package and Facebook Submitit package for efficient model training on multi-node clusters. We provide a launcher file launch.py which provides three ways of launching experiment:

# Launching using submitit on a SLURM cluster (e.g. 10 hour 1 node 4 GPU experiment with config file $CONFIG)
$ python launch.py --mode submitit --time 10 --qos $QOS --partition $PARTITION --mem_per_gpu 80 \
                   --gpu_per_node 4 --config $CONFIG note=$NOTE name=$EXP_NAME
                   
# Launching using accelerator with a multi-gpu instance
$ python launch.py --mode accelerate --gpu_per_node 4 --num_nodes 1 -- config $CONFIG note=$NOTE name=$EXP_NAME

Basically, launch.py set up process(es) to run the main entry point run.py under multi GPU settings. You can directly overwrite configurations in the configuration file $CONFIG by setting property fields using = after all command line arguments. (e.g., name=$EXP_NAME,solver.epochs=400,dataloader.batchsize=4)

For testing and inference, remember to set up the testing data correctly under each configuration files and switch the mode field in the configurations into test (i.e., mode=test).

2. Debugging

If you want to debug your code without an additional job launcher, you can also directly run the file run.py . As an example, you can directly run the file for debugging with

# Single card direct run for debugging purposes
$ python run.py --config-path ${PROJ_PATH}/configs/final/ --config-name ${EXP_CONFIG_NAME}.yaml \
                num_gpu=1 hydra.run.dir=. hydra.output_subdir=null hydra/job_logging=disabled hydra/hydra_logging=disabled \
                debug.flag=True debug.debug_size=1 dataloader.batchsize=2 debug.hard_debug=True name=Debug_test

Checkpoints

We provide all available checkpoints under the same data directory, named after Checkpoints. Here we provide detailed descriptions of checkpoint in the table below:

Setting	Description	Corresponding Experiment	Checkpoint based on experiment setting
`pre-trained`	GPS model pre-trained on SceneVerse	3D-VL grounding (Tab.2)	Model
`scratch`	GPS model trained on datasets from scratch	3D-VL grounding (Tab.2) SceneVerse-val (Tab. 3)	ScanRefer, Sr3D, Nr3D, SceneVerse-val
`fine-tuned`	GPS model fine-tuned on datasets with grounding heads	3D-VL grounding (Tab.2)	ScanRefer, Sr3D, Nr3D
`zero-shot`	GPS model trained on SceneVerse without data from ScanNet and MultiScan	Zero-shot Transfer (Tab.3)	Model
`zero-shot text`	GPS	Zero-shot Transfer (Tab.3)	ScanNet, SceneVerse-val
`text-ablation`	Ablations on the type of language used during pre-training	Ablation on Text (Tab.7)	Template only, Template+LLM
`scene-ablation`	Ablations on the use of synthetic scenes during pre-training	Ablation on Scene (Tab.8)	Real only, S3D only, ProcTHOR only
`model-ablation`	Ablations on the use of losses during pre-training	Ablation on Model Design (Tab.9)	Refer only, Refer+Obj-lvl, w/o Scene-lvl
`3d-qa`	Results for QA fine-tuning on ScanQA and SQA3D	3D-QA Experiments (Tab.5)	ScanQA, SQA3D

To properly use the pre-trained checkpoints, you can use the pretrain_ckpt_path key in the configs:

# Directly testing the checkpoint
$ python launch.py --mode submitit --qos $QOS --partition $PARTITION --mem_per_gpu 80 \
                   --gpu_per_node 4 --config $CONFIG note=$NOTE name=$EXP_NAME mode=test \
                   pretrain_ckpt_path=$PRETRAIN_CKPT

# Fine-tuning with pre-trained checkpoint
$ python launch.py --mode submitit --qos $QOS --partition $PARTITION --mem_per_gpu 80 \
                   --gpu_per_node 4 --config $CONFIG note=$NOTE name=$EXP_NAME \
                   pretrain_ckpt_path=$PRETRAIN_CKPT

For fine-tuning the pre-trained checkpoint on datasets, you can use the fine-tuning config files provided under configs/final/finetune.