README.md

March 23, 2026 · View on GitHub

GraspCoT: Integrating Physical Property Reasoning for 6-DoF Grasping under Flexible Language Instructions
(ICCV 2025)

Xiaomeng Chu, Jiajun Deng, Guoliang You, Wei Liu, Xingchen Li, Jianmin Ji, Yanyong Zhang

@article{chu2025graspcot,
  title={GraspCoT: Integrating Physical Property Reasoning for 6-DoF Grasping under Flexible Language Instructions},
  author={Chu, Xiaomeng and Deng, Jiajun and You, Guoliang and Liu, Wei and Li, Xingchen and Ji, Jianmin and Zhang, Yanyong},
  journal={arXiv preprint arXiv:2503.16013},
  year={2025}
}

Overview

This repository is an official implementation of GraspCOT, an innovative 6-DoF grasp detection framework that integrates a Chain-of-Thought (CoT) reasoning mechanism oriented to physical properties, guided by auxiliary question-answering (QA) tasks. The video demonstration is included here.

Environment

We test our codes under the following environment:

Python 3.10
Pytorch 2.1.0
CUDA Version 11.8

Clone this repository.

git clone https://github.com/cxmomo/GraspCoT.git
cd GraspCoT

Install packages.

conda create -n graspcot python=3.10 -y
conda activate graspcot 
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.1.0+cu118.html
pip install flash-attn --no-build-isolation
pip install -e .

Install other dependencies:

pip install openmim
mim install mmengine==0.10.5
mim install mmcv==2.1.0
mim install mmdet==3.2.0
pip install numpy==1.26.4
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install pytorch3d -c pytorch3d

Prepare Dataset

Make data folder.
```
cd GraspCOT
mkdir data
```
Download nuScenes from Grasp-Anything-6D, follow their usage instructions, and put it in data/grasp_anything.
Download the revised scenario description file with bugs fixed from Google Drive, unzip, and replace the original directory scene_description (downloaded from the previous step) in data/grasp_anything.
Download the generated flexible instruction info files of train and val sets, and put them in data/grasp_anything/dialogues.
Generate the projected images, depth maps, and a series of grasping info files from the colored point cloud.
```
bash scripts/create_grasp_data.sh
```
Folder structure:

data/grasp_anything
├── depth
├── dialogues
├── pc
├── rgb
├── scene_description
├── grasp_anything_infos_train_0.pkl
├── ...
├── grasp_anything_infos_train_7.pkl
├── grasp_anything_infos_val_0.pkl
├── grasp_anything_infos_val_1.pkl

Training

Make a folder for pretrained LLM models.
```
mkdir pretrained_llms
```
Download pretrained LLaVA-3D-7B and pretrained CLIP, then put it in directory pretrained_llms/:

pretrained_llms
├── llava-3d-7b
   ├── config.json
   ├── ...
   ├── model-00001-of-00003.safetensors
   ├── ...
├── clip-vit-large-patch14-336
   ├── config.json
   ├── ...

Train GraspCoT with 8 NVIDIA RTX 3090 GPUs:

bash scripts/train/dist_train.sh

Evaluation

Make a folder for trained model weights.
```
mkdir checkpoints
```
Download the model weights and put it in directory checkpoints/llava-graspcot.

Generate the predicted 6-DoF grasp poses and perform evaluation.

cd checkpoints/llava-graspcot
ln -s ../../pretrained_llms/llava-3d-7b/* ./
python llava/eval/generate_scene_grasp.py --model-path checkpoints/llava-graspcot/ --model-base pretrained_llms/llava-3d-7b/ --data-path data/grasp_anything --output_dir checkpoints/llava-graspcot/
python scripts/eval/eval_scene_grasp.py --data checkpoints/llava-graspcot/scenegrasp_gen_data.pkl

Acknowledgements

Many thanks to these excellent open-source projects:

Codebase: LLaVA-3D, MMDetection, MMCV
Data: Grasp-Anything-6D