Prepare the training and test dataset of the SQA3D
March 26, 2025 ยท View on GitHub
Download the training / test annotations and videos
- You can download the training / test annotations and videos from SQA3D.
- Evenly extract the 32 RGB frames and depth frames from the raw files (TODO: sample scripts).
Convert the training annotation
You need to convert the raw training annotation into instruction tuning dataset. Please refer to file tools/3d/sqa3d/create_sqa3d_training_annotations.py. You can also download the processed annotations and video mapping files from here.
Convert the test annotations
You need to convert the raw test annotations into test format. Please refer to file tools/3d/sqa3d/create_sqa3d_eval_annotations.py. You can also download the processed annotations from here.
Extract the 3d feature from video frames and depth files
- Prepare the LLaVA-3D environment following the instruction here.
- Run the extraction script to extract the 3d feature (TODO: sample scripts).
Download the pre-extracted audio feature
If you don't want to extract the feature by yourself, you can also download the pre-extracted 3d feature from here.