Training and Inference

December 30, 2024 ยท View on GitHub

Environment Setup

To install the environment requirements needed for SceneVerse, you can run the installation scripts provided by:

$ conda env create -n sceneverse python=3.9
$ conda activate sceneverse
$ pip install --r requirements.txt

Meanwhile, SceneVerse depends on an efficient implementation of PointNet2 which is located in modules. Remember to install it with

$ cd modules/third_party/pointnet2
$ python setup.py install
$ cd ../..

Model Configurations

1. Experiment Setup

We provide all experiment configurations in configs/final, you can find the experiment setting in the top of comment each experiment file. To correctly use the configuration files, you need to change the following fields in the configuration file to load paths correctly:

  • base_dir: save path for model checkpoints, configurations, and logs.
  • logger.entity: we used W&B for logging experiments, change it to your corresponding account.
  • data.{DATASET}_familiy_base: path to {Dataset} related data.
  • model.vision.args.path: path to the pre-trained object encoder (PointNet++).
  • model.vision.args.lang_path: deprecated, but basically text embeddings of the 607 classes in ScanNet.

You can walk through the configs/final/all_pretrain.yaml and compare it with other files to see how we controlled data and objectives used in training.

Experiments

1. Training and Inference

This codebase leverages Huggingface Accelerate package and Facebook Submitit package for efficient model training on multi-node clusters. We provide a launcher file launch.py which provides three ways of launching experiment:

# Launching using submitit on a SLURM cluster (e.g. 10 hour 1 node 4 GPU experiment with config file $CONFIG)
$ python launch.py --mode submitit --time 10 --qos $QOS --partition $PARTITION --mem_per_gpu 80 \
                   --gpu_per_node 4 --config $CONFIG note=$NOTE name=$EXP_NAME
                   
# Launching using accelerator with a multi-gpu instance
$ python launch.py --mode accelerate --gpu_per_node 4 --num_nodes 1 -- config $CONFIG note=$NOTE name=$EXP_NAME 

Basically, launch.py set up process(es) to run the main entry point run.py under multi GPU settings. You can directly overwrite configurations in the configuration file $CONFIG by setting property fields using = after all command line arguments. (e.g., name=$EXP_NAME,solver.epochs=400,dataloader.batchsize=4)

For testing and inference, remember to set up the testing data correctly under each configuration files and switch the mode field in the configurations into test (i.e., mode=test).

2. Debugging

If you want to debug your code without an additional job launcher, you can also directly run the file run.py . As an example, you can directly run the file for debugging with

# Single card direct run for debugging purposes
$ python run.py --config-path ${PROJ_PATH}/configs/final/ --config-name ${EXP_CONFIG_NAME}.yaml \
                num_gpu=1 hydra.run.dir=. hydra.output_subdir=null hydra/job_logging=disabled hydra/hydra_logging=disabled \
                debug.flag=True debug.debug_size=1 dataloader.batchsize=2 debug.hard_debug=True name=Debug_test

Checkpoints

We provide all available checkpoints under the same data directory, named after Checkpoints. Here we provide detailed descriptions of checkpoint in the table below:

SettingDescriptionCorresponding ExperimentCheckpoint based on experiment setting
pre-trainedGPS model pre-trained on SceneVerse3D-VL grounding (Tab.2)Model
scratchGPS model trained on datasets from scratch3D-VL grounding (Tab.2)
SceneVerse-val (Tab. 3)
ScanRefer, Sr3D, Nr3D, SceneVerse-val
fine-tunedGPS model fine-tuned on datasets with grounding heads3D-VL grounding (Tab.2)ScanRefer, Sr3D, Nr3D
zero-shotGPS model trained on SceneVerse without data from ScanNet and MultiScanZero-shot Transfer (Tab.3)Model
zero-shot textGPSZero-shot Transfer (Tab.3)ScanNet, SceneVerse-val
text-ablationAblations on the type of language used during pre-trainingAblation on Text (Tab.7)Template only, Template+LLM
scene-ablationAblations on the use of synthetic scenes during pre-trainingAblation on Scene (Tab.8)Real only, S3D only, ProcTHOR only
model-ablationAblations on the use of losses during pre-trainingAblation on Model Design (Tab.9)Refer only, Refer+Obj-lvl, w/o Scene-lvl
3d-qaResults for QA fine-tuning on ScanQA and SQA3D3D-QA Experiments (Tab.5)ScanQA, SQA3D

To properly use the pre-trained checkpoints, you can use the pretrain_ckpt_path key in the configs:

# Directly testing the checkpoint
$ python launch.py --mode submitit --qos $QOS --partition $PARTITION --mem_per_gpu 80 \
                   --gpu_per_node 4 --config $CONFIG note=$NOTE name=$EXP_NAME mode=test \
                   pretrain_ckpt_path=$PRETRAIN_CKPT

# Fine-tuning with pre-trained checkpoint
$ python launch.py --mode submitit --qos $QOS --partition $PARTITION --mem_per_gpu 80 \
                   --gpu_per_node 4 --config $CONFIG note=$NOTE name=$EXP_NAME \
                   pretrain_ckpt_path=$PRETRAIN_CKPT

For fine-tuning the pre-trained checkpoint on datasets, you can use the fine-tuning config files provided under configs/final/finetune.