[ICCV2025] CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation

September 16, 2025 · View on GitHub

Overview

Official implementation of our CleanPose, the first solution to mitigate the confoundering effect in category-level pose estimation via causal learning and knowledge distillation.

Environment Settings

The code has been tested with

python 3.9
torch 1.12
cuda 12.4

Some dependencies:

pip install gorilla-core==0.2.5.3
pip install opencv-python

cd model/pointnet2
python setup.py install

Data Preparation

NOCS dataset

Download and preprocess the dataset following DPDN
Download and unzip the segmentation results here

Put them under PROJ_DIR/dataand the final file structure is as follows:

data
├── camera
│   ├── train
│   ├── val
│   ├── train_list_all.txt
│   ├── train_list.txt
│   ├── val_list_all.txt
├── real
│   ├── train
│   ├── test
│   ├── train_list.txt
│   ├── train_list_all.txt
│   └── test_list_all.txt
├── segmentation_results
│   ├── CAMERA25
│   └── REAL275
├── camera_full_depths
├── gts
└── obj_models

HouseCat6D

Download and unzip the dataset from HouseCat6D and the final file structure is as follows:

housecat6d
├── scene**
├── val_scene*
├── test_scene*
└── obj_models_small_size_final

Other Preparation

Confounder queue generation

You can generate the queue list with following command or utilize the pre-extracted pkl file from this link. Subsequently, you should specify the 'init_tensor_list_dir' in the config file.

python queue_extraction.py

Download the pretrained weights of PointBERT, Pointnet2_ssg from ULIP Hugging Face or this link. If you want to use ResNet as the visual encoder, you can download the weights of ResNet18 (optional). The pretrained weights file structure is as follows:

model
├── pointbert
│   ├── pretrained_model
│       └── pretrained_models_ckpt_zero-sho_classification_pointbert_ULIP-2.pt
├── pointnet2
├── pointnet2_ulip
│   ├── pretrained_model
│       └── pretrained_models_ckpt_zero-sho_classification_checkpoint_pointnet2_ssg.pt
└── resnet18-5c106cde.pth (optional)

Train

Training on NOCS

python train.py --config config/REAL/camera_real.yaml

Training on HouseCat6D

python train_housecat6d.py --config config/HouseCat6D/housecat6d.yaml

Evaluate

Evaluate on NOCS:

python test.py --config config/REAL/camera_real.yaml --test_epoch 30

Evaluate on HouseCat6D:

python test_housecat6d.py --config config/HouseCat6D/housecat6d.yaml --test_epoch 150

Results

You can download our training logs, detailed metrics for each category and checkpoints here.

REAL275 testset:

IoU25	IoU50	IoU75	5 degree 2 cm	5 degree 5 cm	10 degree 2 cm	10 degree 5 cm
83.3	81.2	62.7	61.7	67.6	78.3	86.3

CAMERA25 testset:

IoU25	IoU50	IoU75	5 degree 2 cm	5 degree 5 cm	10 degree 2 cm	10 degree 5 cm
94.8	94.3	92.5	80.3	84.2	87.7	92.7

HouseCat6D testset:

IoU25	IoU50	IoU75	5 degree 2 cm	5 degree 5 cm	10 degree 2 cm	10 degree 5 cm
89.2	79.8	53.9	22.4	24.1	51.6	56.5

Visualization

For visualization, please run

python visualize.py --config config/REAL/camera_real.yaml --test_epoch 30

Acknowledgements

Our implementation leverages the code from DPDN, AG-Pose and GOAT. Thank them for their excellent works!

Citation

If our work is useful to you, please consider citing our paper using the following BibTeX entry.

@inproceedings{lin2025cleanpose,
  title={Cleanpose: Category-level object pose estimation via causal learning and knowledge distillation},
  author={Lin, Xiao and Peng, Yun and Wang, Liuyi and Zhong, Xianyou and Zhu, Minghao and Yang, Jingwei and Feng, Yi and Liu, Chengju and Chen, Qijun},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2025}
}

License

Our code is released under MIT License.