PointSD

July 15, 2025 · View on GitHub

[ICCV 2025] Harnessing Text-to-Image Diffusion Models for Point Cloud

By Yiyang Chen, Shanshan Zhao, Lunhao Duan, Changxing Ding and Dacheng Tao

This is the official implementation of "Harnessing Text-to-Image Diffusion Models for Point Cloud" [arXiv]

Installation

# Quick Start
conda create -n pointsd python=3.9 -y
conda activate pointsd

# Install pytorch
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2

# Install required packages
pip install -r requirements.txt

# Install fine-tune requirements
cd ./Point-MAE
pip install -r requirements.txt

# Install the extensions
# Chamfer Distance
cd ./extensions/chamfer_dist
python setup.py install --user
# PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

# Go back to the project root directory
cd ../../../

Stable Diffusion Checkpoint

We use SD-v1.5 to conduct experiments and you can download the checkpoint here.

PointSD Models

Task	Dataset	Config	Acc.	Download
Pre-training	ShapeNet	train_pointsd.sh	N.A.	Pre-train
Classification	ScanObjectNN	finetune_scan_objbg.yaml	95.18%	OBJ_BG
Classification	ScanObjectNN	finetune_scan_objonly.yaml	93.63%	OBJ_ONLY
Classification	ScanObjectNN	finetune_scan_hardest.yaml	90.08%	PB_T50_RS
Classification	ModelNet40	finetune_modelnet.yaml	93.7%	ModelNet40

Task	Dataset	Config	5w10s	5w20s	10w10s	10w20s	Download
Few-shot learning	ModelNet40	fewshot.yaml	97.7 ± 1.8%	99.0 ± 0.9%	93.8±3.6%	95.9±2.6%	FewShot

Task	Dataset	Config	Cls.mIoU	Insta.mIoU	Download
Segmentation	ShapeNetPart	segmentation	84.5%	86.1%	Segmentation

Pre-training

To pre-train PointSD, you need to set task_name, model_dir, dataset_dir and img_dir in train_pointsd.sh. For the first training stage, you need to set run_stage to stage1 and then run:

bash train_pointsd.sh

For the second training stage, you need to set run_stage to stage2 and set stage1_ckpt to the path of the first stage checkpoint and then run:

bash train_pointsd.sh

If you use more than one gpu to run, please remember to set num_processes to the corresponding number of gpus . You can use checkpoint-120000/ckpt-stage2.pt or checkpoint-best/ckpt-stage2.pt for subsequent fine-tuning.

Fine-tuning

First, switch to the Point-MAE folder:

cd Point-MAE

Fine-tuning on ScanObjectNN, run:

# Select one config from finetune_scan_objbg/objonly/hardest.yaml
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/finetune_scan_hardest.yaml \
--finetune_model --exp_name <output_file_name> --ckpts <path/to/pre-trained/model> --seed $RANDOM


# Test with fine-tuned ckpt
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --test --config cfgs/finetune_scan_hardest.yaml \
--exp_name <output_file_name> --ckpts <path/to/best/fine-tuned/model>

Fine-tuning on ModelNet40, run:

CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/finetune_modelnet.yaml \
--finetune_model --exp_name <output_file_name> --ckpts <path/to/pre-trained/model> --seed $RANDOM

# Test with fine-tuned ckpt
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --test --config cfgs/finetune_modelnet.yaml \
--exp_name <output_file_name> --ckpts <path/to/best/fine-tuned/model>

Few-shot learning, run:

CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/fewshot.yaml --finetune_model \
--ckpts <path/to/pre-trained/model> --exp_name <output_file_name> --way <5 or 10> --shot <10 or 20> --fold <0-9> --seed $RANDOM

Part segmentation on ShapeNetPart, run:

cd segmentation
python main.py --gpu <gpu_id> --ckpts <path/to/pre-trained/model> \
--log_dir <log_dir> --learning_rate 0.0002 --epoch 300 \
--root <path/to/data> \
--seed $RANDOM

Acknowledgements

This codebase is built upon Point-MAE, Pointnet2_PyTorch, VPD, IPAdapter, ULIP.

Citation

If you find this repo useful, please cite:

@article{chen2025harnessing,
title={Harnessing Text-to-Image Diffusion Models for Point Cloud Self-Supervised Learning},
author={Chen, Yiyang and Zhao, Shanshan and Duan, Lunhao and Ding, Changxing and Tao, Dacheng},
journal={arXiv preprint arXiv:2507.09102},
year={2025}
}