Style Customization of Text-to-Vector Generation with Image Diffusion Priors
May 16, 2025 · View on GitHub
Setup
Create a new conda environment:
conda create --name svg_diffusion python=3.10
conda activate svg_diffusion
Install pytorch and the following libraries:
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
Install diffvg:
git clone https://github.com/BachiLi/diffvg.git
cd diffvg
git submodule update --init --recursive
conda install -y -c anaconda cmake
conda install -y -c conda-forge ffmpeg
pip install svgwrite svgpathtools cssutils torch-tools
python setup.py install
TODO
- Release Stage1 model weights.
- Release Stage2 model weights.
- Release the inference code.
- Release training scripts.
Model Weights
| Model name | Weight |
|---|---|
| Stage1 | link |
| Stage2 | TBA |
Quickstart
-
Download the pretrained weights
Put the files in thepretraineddirectory. -
Launch the Gradio demo:
CUDA_VISIBLE_DEVICES=0 python -m svg_ldm.gradio_t2svgUse the
Generation Countslider to sample multiple SVGs in one click. -
or run the inference script:
CUDA_VISIBLE_DEVICES=0 python -m svg_ldm.test_ddpm_tr_svgsAdjust
test_numto control how many samples are produced per prompt.
Tips
- Diffusion sampling is stochastic. Vary the random seed or adjust the sampling settings to explore different outputs.
- We recommend DDPM sampling for higher-quality results, which takes about 30 seconds to generate one SVG on an NVIDIA RTX 4090.
- The model was trained only on simple class labels (see
dataset/label.csv), so it doesn’t understand complex text prompts. Fine-tune the model on richer SVG datasets can support more detailed prompts.
Contact
If you have any question, contact us through email at zhangpeiying17@gmail.com.