PointSD
July 15, 2025 · View on GitHub
[ICCV 2025] Harnessing Text-to-Image Diffusion Models for Point Cloud
By Yiyang Chen, Shanshan Zhao, Lunhao Duan, Changxing Ding and Dacheng Tao
This is the official implementation of "Harnessing Text-to-Image Diffusion Models for Point Cloud" [arXiv]

Installation
# Quick Start
conda create -n pointsd python=3.9 -y
conda activate pointsd
# Install pytorch
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
# Install required packages
pip install -r requirements.txt
# Install fine-tune requirements
cd ./Point-MAE
pip install -r requirements.txt
# Install the extensions
# Chamfer Distance
cd ./extensions/chamfer_dist
python setup.py install --user
# PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
# Go back to the project root directory
cd ../../../
Datasets
See DATASET.md for details.
Stable Diffusion Checkpoint
We use SD-v1.5 to conduct experiments and you can download the checkpoint here.
PointSD Models
| Task | Dataset | Config | Acc. | Download |
|---|---|---|---|---|
| Pre-training | ShapeNet | train_pointsd.sh | N.A. | Pre-train |
| Classification | ScanObjectNN | finetune_scan_objbg.yaml | 95.18% | OBJ_BG |
| Classification | ScanObjectNN | finetune_scan_objonly.yaml | 93.63% | OBJ_ONLY |
| Classification | ScanObjectNN | finetune_scan_hardest.yaml | 90.08% | PB_T50_RS |
| Classification | ModelNet40 | finetune_modelnet.yaml | 93.7% | ModelNet40 |
| Task | Dataset | Config | 5w10s | 5w20s | 10w10s | 10w20s | Download |
|---|---|---|---|---|---|---|---|
| Few-shot learning | ModelNet40 | fewshot.yaml | 97.7 ± 1.8% | 99.0 ± 0.9% | 93.8±3.6% | 95.9±2.6% | FewShot |
| Task | Dataset | Config | Cls.mIoU | Insta.mIoU | Download |
|---|---|---|---|---|---|
| Segmentation | ShapeNetPart | segmentation | 84.5% | 86.1% | Segmentation |
Pre-training
To pre-train PointSD, you need to set task_name, model_dir, dataset_dir and img_dir in train_pointsd.sh. For the first training stage, you need to set run_stage to stage1 and then run:
bash train_pointsd.sh
For the second training stage, you need to set run_stage to stage2 and set stage1_ckpt to the path of the first stage checkpoint and then run:
bash train_pointsd.sh
If you use more than one gpu to run, please remember to set num_processes to the corresponding number of gpus . You can use checkpoint-120000/ckpt-stage2.pt or checkpoint-best/ckpt-stage2.pt for subsequent fine-tuning.
Fine-tuning
First, switch to the Point-MAE folder:
cd Point-MAE
Fine-tuning on ScanObjectNN, run:
# Select one config from finetune_scan_objbg/objonly/hardest.yaml
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/finetune_scan_hardest.yaml \
--finetune_model --exp_name <output_file_name> --ckpts <path/to/pre-trained/model> --seed $RANDOM
# Test with fine-tuned ckpt
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --test --config cfgs/finetune_scan_hardest.yaml \
--exp_name <output_file_name> --ckpts <path/to/best/fine-tuned/model>
Fine-tuning on ModelNet40, run:
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/finetune_modelnet.yaml \
--finetune_model --exp_name <output_file_name> --ckpts <path/to/pre-trained/model> --seed $RANDOM
# Test with fine-tuned ckpt
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --test --config cfgs/finetune_modelnet.yaml \
--exp_name <output_file_name> --ckpts <path/to/best/fine-tuned/model>
Few-shot learning, run:
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/fewshot.yaml --finetune_model \
--ckpts <path/to/pre-trained/model> --exp_name <output_file_name> --way <5 or 10> --shot <10 or 20> --fold <0-9> --seed $RANDOM
Part segmentation on ShapeNetPart, run:
cd segmentation
python main.py --gpu <gpu_id> --ckpts <path/to/pre-trained/model> \
--log_dir <log_dir> --learning_rate 0.0002 --epoch 300 \
--root <path/to/data> \
--seed $RANDOM
Acknowledgements
This codebase is built upon Point-MAE, Pointnet2_PyTorch, VPD, IPAdapter, ULIP.
Citation
If you find this repo useful, please cite:
@article{chen2025harnessing,
title={Harnessing Text-to-Image Diffusion Models for Point Cloud Self-Supervised Learning},
author={Chen, Yiyang and Zhao, Shanshan and Duan, Lunhao and Ding, Changxing and Tao, Dacheng},
journal={arXiv preprint arXiv:2507.09102},
year={2025}
}