README.md
March 23, 2026 · View on GitHub
GraspCoT: Integrating Physical Property Reasoning for 6-DoF Grasping under Flexible Language Instructions
(ICCV 2025)
Xiaomeng Chu, Jiajun Deng, Guoliang You, Wei Liu, Xingchen Li, Jianmin Ji, Yanyong Zhang
@article{chu2025graspcot,
title={GraspCoT: Integrating Physical Property Reasoning for 6-DoF Grasping under Flexible Language Instructions},
author={Chu, Xiaomeng and Deng, Jiajun and You, Guoliang and Liu, Wei and Li, Xingchen and Ji, Jianmin and Zhang, Yanyong},
journal={arXiv preprint arXiv:2503.16013},
year={2025}
}
Overview
This repository is an official implementation of GraspCOT, an innovative 6-DoF grasp detection framework that integrates a Chain-of-Thought (CoT) reasoning mechanism oriented to physical properties, guided by auxiliary question-answering (QA) tasks. The video demonstration is included here.
Environment
We test our codes under the following environment:
- Python 3.10
- Pytorch 2.1.0
- CUDA Version 11.8
-
Clone this repository.
git clone https://github.com/cxmomo/GraspCoT.git cd GraspCoT -
Install packages.
conda create -n graspcot python=3.10 -y conda activate graspcot pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118 pip install torch-scatter -f https://data.pyg.org/whl/torch-2.1.0+cu118.html pip install flash-attn --no-build-isolation pip install -e . -
Install other dependencies:
pip install openmim mim install mmengine==0.10.5 mim install mmcv==2.1.0 mim install mmdet==3.2.0 pip install numpy==1.26.4 conda install -c fvcore -c iopath -c conda-forge fvcore iopath conda install pytorch3d -c pytorch3d
Prepare Dataset
- Make data folder.
cd GraspCOT mkdir data - Download nuScenes from Grasp-Anything-6D, follow their usage instructions, and put it in
data/grasp_anything. - Download the revised scenario description file with bugs fixed from Google Drive, unzip, and replace the original directory
scene_description(downloaded from the previous step) indata/grasp_anything. - Download the generated flexible instruction info files of train and val sets, and put them in
data/grasp_anything/dialogues. - Generate the projected images, depth maps, and a series of grasping info files from the colored point cloud.
bash scripts/create_grasp_data.sh - Folder structure:
data/grasp_anything
├── depth
├── dialogues
├── pc
├── rgb
├── scene_description
├── grasp_anything_infos_train_0.pkl
├── ...
├── grasp_anything_infos_train_7.pkl
├── grasp_anything_infos_val_0.pkl
├── grasp_anything_infos_val_1.pkl
Training
-
Make a folder for pretrained LLM models.
mkdir pretrained_llms -
Download pretrained LLaVA-3D-7B and pretrained CLIP, then put it in directory
pretrained_llms/:
pretrained_llms
├── llava-3d-7b
├── config.json
├── ...
├── model-00001-of-00003.safetensors
├── ...
├── clip-vit-large-patch14-336
├── config.json
├── ...
- Train GraspCoT with 8 NVIDIA RTX 3090 GPUs:
bash scripts/train/dist_train.sh
Evaluation
-
Make a folder for trained model weights.
mkdir checkpoints -
Download the model weights and put it in directory
checkpoints/llava-graspcot. -
Generate the predicted 6-DoF grasp poses and perform evaluation.
cd checkpoints/llava-graspcot ln -s ../../pretrained_llms/llava-3d-7b/* ./ python llava/eval/generate_scene_grasp.py --model-path checkpoints/llava-graspcot/ --model-base pretrained_llms/llava-3d-7b/ --data-path data/grasp_anything --output_dir checkpoints/llava-graspcot/ python scripts/eval/eval_scene_grasp.py --data checkpoints/llava-graspcot/scenegrasp_gen_data.pkl
Acknowledgements
Many thanks to these excellent open-source projects:
- Codebase: LLaVA-3D, MMDetection, MMCV
- Data: Grasp-Anything-6D