VecSetX

October 3, 2025 · View on GitHub

Following the introduction of VecSet, extensive work has been done to propose enhancements. This project is designed to incorporate these novel designs and to provide a unifed framework for VecSet-based representations.

:fire: Updates

[2025-10-03] Released the inference script.
[2025-04-09] Released the pretrained model point_vec1024x32_dim1024_depth24_sdf_nb and learnable_vec1024_dim1024_depth24_sdf.
[2025-04-06] Released traing code and a pretrained model learnable_vec1024x32_dim1024_depth24_sdf_nb.

:hammer: Installation

conda create -y -n vecset python=3.11 -y
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124

conda install cuda-nvcc=12.4 -c nvidia -y
conda install libcusparse-dev -y
conda install libcublas-dev -y
conda install libcusolver-dev -y
conda install libcurand-dev -y # torch_cluster

pip install flash-attn --no-build-isolation
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.6.0+cu124.html
pip install tensorboard
pip install einops
pip install trimesh
pip install tqdm
pip install PyMCubes

:train: Training Example

16 GPUs (4 GPUs with accum_iter 4)

cd vecset
torchrun \
    --nproc_per_node=4 \
    main_ae.py \
    --accum_iter=4  \
    --model learnable_vec1024x16_dim1024_depth24_nb \
    --output_dir output/ae/learnable_vec1024x16_dim1024_depth24_sdf_nb \
    --log_dir output/ae/learnable_vec1024x16_dim1024_depth24_sdf_nb \
    --num_workers 24 \
    --point_cloud_size 8192 \
    --batch_size 16 \
    --epochs 500 \
    --warmup_epochs 1 --blr 5e-5 --clip_grad 1

:pencil: Model Descriptions

The base model design is from VecSet. I have incorporated the following features list:

Faster training with Flash Attention
Normalized Bottleneck (NBAE) from LaGeM. No need to tune the KL weight anymore!
SDF regression instead of occupancy classification suggested by TripoSG. For now, I only use Eikonal regularization.

I am planning to incorporate the following features:

Edge sampling from Dora-VAE
Multiresolution training from CLAY
Compact autoencoder from COD-VAE
Quantized bottleneck (VQ).
(Start an issue if you have any ideas!)

:floppy_disk: Checkpoints

The following models will be released in this link:

(Other models are training!)

model	Queries	Layers	Channels	Bottlneck (Size x Ch)	Regularization	Loss
`point_vec1024x32_dim1024_depth24_sdf_nb`	Point	24	1024	1024x32	NB	SDF+Eikonal
`learnable_vec1024x32_dim1024_depth24_sdf_nb`	Learnable	24	1024	1024x32	NB	SDF+Eikonal
`learnable_vec1024_dim1024_depth24_sdf`	Learnable	24	1024	1024x1024		SDF+Eikonal

:balloon: Inference

If you want to test the autoencoder, make sure the input surface point cloud is normalized,

## surface: N x 3
shifts = (surface.max(axis=0) + surface.min(axis=0)) / 2
surface = surface - shifts
distances = np.linalg.norm(surface, axis=1)
scale = 1 / np.max(distances)
surface *= scale

Here is the inference script,

python infer.py --input input_point_cloud.ply --output output_mesh.obj

The available model definitions can be found in autoencoder.py. Note that the script assumes the input file is a point cloud instead of a mesh file.

:bookmark_tabs: Other minor adjustments

Removed layernorm on KV suggested by Youkang Kong
Added layernorm before final output layer.
Added zero initialization on the final output layer.
Added random rotations as the data augmentations as in LaGeM.
Adjusted code for latest version of PyTorch.