VecSetX
October 3, 2025 ยท View on GitHub
Following the introduction of VecSet, extensive work has been done to propose enhancements. This project is designed to incorporate these novel designs and to provide a unifed framework for VecSet-based representations.
:fire: Updates
- [2025-10-03] Released the inference script.
- [2025-04-09] Released the pretrained model
point_vec1024x32_dim1024_depth24_sdf_nbandlearnable_vec1024_dim1024_depth24_sdf. - [2025-04-06] Released traing code and a pretrained model
learnable_vec1024x32_dim1024_depth24_sdf_nb.
:hammer: Installation
conda create -y -n vecset python=3.11 -y
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
conda install cuda-nvcc=12.4 -c nvidia -y
conda install libcusparse-dev -y
conda install libcublas-dev -y
conda install libcusolver-dev -y
conda install libcurand-dev -y # torch_cluster
pip install flash-attn --no-build-isolation
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.6.0+cu124.html
pip install tensorboard
pip install einops
pip install trimesh
pip install tqdm
pip install PyMCubes
:train: Training Example
16 GPUs (4 GPUs with accum_iter 4)
cd vecset
torchrun \
--nproc_per_node=4 \
main_ae.py \
--accum_iter=4 \
--model learnable_vec1024x16_dim1024_depth24_nb \
--output_dir output/ae/learnable_vec1024x16_dim1024_depth24_sdf_nb \
--log_dir output/ae/learnable_vec1024x16_dim1024_depth24_sdf_nb \
--num_workers 24 \
--point_cloud_size 8192 \
--batch_size 16 \
--epochs 500 \
--warmup_epochs 1 --blr 5e-5 --clip_grad 1
:pencil: Model Descriptions
The base model design is from VecSet. I have incorporated the following features list:
- Faster training with Flash Attention
- Normalized Bottleneck (NBAE) from LaGeM. No need to tune the KL weight anymore!
- SDF regression instead of occupancy classification suggested by TripoSG. For now, I only use Eikonal regularization.
I am planning to incorporate the following features:
- Edge sampling from Dora-VAE
- Multiresolution training from CLAY
- Compact autoencoder from COD-VAE
- Quantized bottleneck (VQ).
- (Start an issue if you have any ideas!)
:floppy_disk: Checkpoints
The following models will be released in this link:
- (Other models are training!)
| model | Queries | Layers | Channels | Bottlneck (Size x Ch) | Regularization | Loss |
|---|---|---|---|---|---|---|
point_vec1024x32_dim1024_depth24_sdf_nb | Point | 24 | 1024 | 1024x32 | NB | SDF+Eikonal |
learnable_vec1024x32_dim1024_depth24_sdf_nb | Learnable | 24 | 1024 | 1024x32 | NB | SDF+Eikonal |
learnable_vec1024_dim1024_depth24_sdf | Learnable | 24 | 1024 | 1024x1024 | SDF+Eikonal |
:balloon: Inference
If you want to test the autoencoder, make sure the input surface point cloud is normalized,
## surface: N x 3
shifts = (surface.max(axis=0) + surface.min(axis=0)) / 2
surface = surface - shifts
distances = np.linalg.norm(surface, axis=1)
scale = 1 / np.max(distances)
surface *= scale
Here is the inference script,
python infer.py --input input_point_cloud.ply --output output_mesh.obj
The available model definitions can be found in autoencoder.py. Note that the script assumes the input file is a point cloud instead of a mesh file.
:bookmark_tabs: Other minor adjustments
- Removed layernorm on KV suggested by Youkang Kong
- Added layernorm before final output layer.
- Added zero initialization on the final output layer.
- Added random rotations as the data augmentations as in LaGeM.
- Adjusted code for latest version of PyTorch.