GETTING_STARTED.md

June 6, 2025 · View on GitHub

Code Structure

Here’s an overview at the code layout and key modules:

module	path
Tokenizer train script	vq_train_accelerate.py
Sequential diffusion tokenizer	vq_model.py
Sequential diffusion decoder	diff_decoder.py
Tokenizer loss	vq_loss.py
---	---
AR train script	train_c2i_accelerate.py
AR Llama backbones	gpt.py (basically unchanged from LlamaGen)
AR generate logic	generate.py

Getting Started

Requirements

PyTorch ≥ 2.1
timm, accelerate, datasets
xformers (optional)
80GB A100 GPUs (smaller batches for GPUs with lower VRAM)
tensorflow for adm evaluation suite

Preparation

mkdir -p temp # For storing temporary checkpoints
python tokenizer/tokenizer_image/utils_repa.py # Download REPA DINO and perform sanity check

IMPORTANT: We use accelerate to train our sequential diffusion tokenizer and D-AR models on multi GPU nodes. Please get yourself familiar with accelerate before proceeding. While the provided demo training scripts are designed for single-node training, they can be easily configured for multi-node setups by modifying relevant parameters. You may also need to modify batch_size or global_batch_size accordingly.

Train or Finetune a Sequential Diffusion Tokenizer

The training script is provided as debug_train_tokenizer.sh, based on accelerate.

bash debug_train_tokenizer.sh

You can simply finetune a sequential diffusion tokenizer from a checkpoint, e.g., temp/tokenizer_v1.pt, by appending this argument to the above script:

--vq-ckpt temp/tokenizer_v1.pt

We provide several dataset interface supports (webdataset, huggingface datasets, or simply folder). You can finetune our tokenizers with your own dataset by varying the --data-path argument. It can start with wds:// or datasets:// with remote streaming data loading (see dar_tool.py for details.)

GETTING_STARTED.md

Code Structure

Getting Started

Requirements

Preparation

Train or Finetune a Sequential Diffusion Tokenizer

Evaluate the Sequential Diffusion Tokenizer

Visualize the Sequential Diffusion Tokenizer

Train a D-AR model

Evaluate D-AR models

Sample Images from D-AR models