[ICCV2025] CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation

September 16, 2025 · View on GitHub

Overview

Official implementation of our CleanPose, the first solution to mitigate the confoundering effect in category-level pose estimation via causal learning and knowledge distillation.

Conference Paper

Environment Settings

The code has been tested with

  • python 3.9
  • torch 1.12
  • cuda 12.4

Some dependencies:

pip install gorilla-core==0.2.5.3
pip install opencv-python

cd model/pointnet2
python setup.py install

Data Preparation

NOCS dataset

  • Download and preprocess the dataset following DPDN
  • Download and unzip the segmentation results here

Put them under PROJ_DIR/dataand the final file structure is as follows:

data
├── camera
│   ├── train
│   ├── val
│   ├── train_list_all.txt
│   ├── train_list.txt
│   ├── val_list_all.txt
├── real
│   ├── train
│   ├── test
│   ├── train_list.txt
│   ├── train_list_all.txt
│   └── test_list_all.txt
├── segmentation_results
│   ├── CAMERA25
│   └── REAL275
├── camera_full_depths
├── gts
└── obj_models

HouseCat6D

Download and unzip the dataset from HouseCat6D and the final file structure is as follows:

housecat6d
├── scene**
├── val_scene*
├── test_scene*
└── obj_models_small_size_final

Other Preparation

Confounder queue generation

You can generate the queue list with following command or utilize the pre-extracted pkl file from this link. Subsequently, you should specify the 'init_tensor_list_dir' in the config file.

python queue_extraction.py

ULIP model weights

Download the pretrained weights of PointBERT, Pointnet2_ssg from ULIP Hugging Face or this link. If you want to use ResNet as the visual encoder, you can download the weights of ResNet18 (optional). The pretrained weights file structure is as follows:

model
├── pointbert
│   ├── pretrained_model
│       └── pretrained_models_ckpt_zero-sho_classification_pointbert_ULIP-2.pt
├── pointnet2
├── pointnet2_ulip
│   ├── pretrained_model
│       └── pretrained_models_ckpt_zero-sho_classification_checkpoint_pointnet2_ssg.pt
└── resnet18-5c106cde.pth (optional)

Train

Training on NOCS

python train.py --config config/REAL/camera_real.yaml

Training on HouseCat6D

python train_housecat6d.py --config config/HouseCat6D/housecat6d.yaml

Evaluate

  • Evaluate on NOCS:
python test.py --config config/REAL/camera_real.yaml --test_epoch 30
  • Evaluate on HouseCat6D:
python test_housecat6d.py --config config/HouseCat6D/housecat6d.yaml --test_epoch 150

Results

You can download our training logs, detailed metrics for each category and checkpoints here.

REAL275 testset:

IoU25IoU50IoU755 degree 2 cm5 degree 5 cm10 degree 2 cm10 degree 5 cm
83.381.262.761.767.678.386.3

CAMERA25 testset:

IoU25IoU50IoU755 degree 2 cm5 degree 5 cm10 degree 2 cm10 degree 5 cm
94.894.392.580.384.287.792.7

HouseCat6D testset:

IoU25IoU50IoU755 degree 2 cm5 degree 5 cm10 degree 2 cm10 degree 5 cm
89.279.853.922.424.151.656.5

Visualization

For visualization, please run

python visualize.py --config config/REAL/camera_real.yaml --test_epoch 30

Acknowledgements

Our implementation leverages the code from DPDN, AG-Pose and GOAT. Thank them for their excellent works!

Citation

If our work is useful to you, please consider citing our paper using the following BibTeX entry.

@inproceedings{lin2025cleanpose,
  title={Cleanpose: Category-level object pose estimation via causal learning and knowledge distillation},
  author={Lin, Xiao and Peng, Yun and Wang, Liuyi and Zhong, Xianyou and Zhu, Minghao and Yang, Jingwei and Feng, Yi and Liu, Chengju and Chen, Qijun},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2025}
}

License

Our code is released under MIT License. License