LDM (DiT-B) checkpoint

May 24, 2026 · View on GitHub

PrITTI: Primitive-based Generation of
Controllable and Editable 3D Semantic Urban Scenes

Christina Ourania Tze · Daniel Dauner · Yiyi Liao · Dzmitry Tsishkou · Andreas Geiger

Paper | Project Page

Teaser Figure

We introduce PrITTI, a latent diffusion-based framework that leverages primitives as the main foundational elements for generating compositional, controllable, and editable 3D semantic scene layouts. Our approach enables applications such as scene editing, inpainting, outpainting, and photo-realistic street view synthesis.

This is the official repository for PrITTI.

News

[May 2026] Pre-trained models, training, inference & evaluation code released.
[Feb 2026] PrITTI is accepted to CVPR 2026, see you in Denver!

🛠️ Installation

1. Environment Variables

Add the following to your ~/.bashrc:

export PRITTI_WORKSPACE="$HOME/pritti_workspace"
export KITTI360_DATASET="$HOME/pritti_workspace/dataset"
export PRITTI_EXP_ROOT="$HOME/pritti_workspace/exp"
export PRITTI_CACHE_ROOT="$HOME/pritti_workspace/cache"
export PRITTI_DEVKIT_ROOT="$HOME/pritti_workspace/pritti"

Then reload your shell:

source ~/.bashrc

2. Clone the Repository

mkdir $PRITTI_WORKSPACE
cd $PRITTI_WORKSPACE
git clone https://github.com/autonomousvision/pritti.git && cd pritti

3. Create the Conda Environment

conda env create --name pritti -f environment.yaml
conda activate pritti
pip install -r requirements.txt
pip install git+https://github.com/raniatze/kitti360Scripts.git
wget https://anaconda.org/pytorch3d/pytorch3d/0.7.8/download/linux-64/pytorch3d-0.7.8-py39_cu121_pyt241.tar.bz2 -O pytorch3d.tar.bz2
conda install pytorch3d.tar.bz2
rm pytorch3d.tar.bz2
pip install -e .

4. Download the Dataset

mkdir $KITTI360_DATASET && cd $KITTI360_DATASET
gdown https://drive.google.com/uc?id=1_yIKHQZj1E1V2jogsDHAEsolKOylpA8U
unzip dataset.zip && mv dataset/* . && rmdir dataset && rm dataset.zip

5. Process the Dataset

Run the following in order:

cd $PRITTI_DEVKIT_ROOT
bash scripts/preprocessing/preprocessing.sh
bash scripts/preprocessing/samples_labeling.sh
bash scripts/preprocessing/dataset_statistics.sh

6. (Optional) Visualize Dataset Samples

To visually verify the preprocessed dataset, you can render a random subset of ground truth samples:

bash scripts/preprocessing/samples_visualization.sh

Screenshots will be saved under $PRITTI_EXP_ROOT/exp/preprocessing/samples_visualization/<TIMESTAMP>/visualizations/, where <TIMESTAMP> is the script's launch time, formatted as YYYY.MM.DD.HH.MM.SS.

📦 Pre-trained Models

We release the pre-trained LVAE and LDM (DiT-B) checkpoints on Hugging Face: raniatze/pritti-checkpoints. If you want to skip training and jump straight to Inference, follow the steps below.

1. Set the Checkpoint Identifiers

Add the following to your ~/.bashrc (these match the released checkpoints) and reload your shell:

export LVAE_TIMESTAMP="2025.06.03.17.23.30"
export LVAE_EPOCH="299"
export LVAE_STEP="580200"

source ~/.bashrc

2. Download the Checkpoints

# LVAE checkpoint
LVAE_DIR=$PRITTI_EXP_ROOT/exp/training_lvae_model/training_lvae_model/$LVAE_TIMESTAMP/checkpoints
mkdir -p $LVAE_DIR
huggingface-cli download raniatze/pritti-checkpoints lvae.ckpt --local-dir $LVAE_DIR
mv $LVAE_DIR/lvae.ckpt $LVAE_DIR/epoch=$LVAE_EPOCH-step=$LVAE_STEP.ckpt

# LDM (DiT-B) checkpoint
LDM_DIR=$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/training_dit_b_model/$LVAE_TIMESTAMP
mkdir -p $LDM_DIR
huggingface-cli download raniatze/pritti-checkpoints --include "ldm_b/*" --local-dir $LDM_DIR
mv $LDM_DIR/ldm_b $LDM_DIR/checkpoint

You can now skip directly to the Inference section.

🏋️ Training

PrITTI is trained in two stages: a Layout Variational Autoencoder followed by a Latent Diffusion Model.

Stage 1: Layout Variational Autoencoder (LVAE)

Train the LVAE

Run:

bash scripts/training/autoencoder/training_autoencoder.sh

The trained checkpoint will be saved under:

$PRITTI_EXP_ROOT/exp/training_lvae_model/training_lvae_model/<TIMESTAMP>/checkpoints/

Once training is complete, update your ~/.bashrc to match your trained checkpoint and reload:

export LVAE_TIMESTAMP="<TIMESTAMP>"   # name of the training run folder (format: YYYY.MM.DD.HH.MM.SS)
export LVAE_EPOCH="<EPOCH>"           # from the checkpoint filename: epoch=<EPOCH>-step=<STEP>.ckpt
export LVAE_STEP="<STEP>"             # from the checkpoint filename: epoch=<EPOCH>-step=<STEP>.ckpt

source ~/.bashrc

Cache LVAE Latents

Once LVAE_TIMESTAMP, LVAE_EPOCH, and LVAE_STEP are set in your environment, run the following command to cache the latent representations used for training the diffusion model:

bash scripts/training/autoencoder/latent_caching.sh

Stage 2: Latent Diffusion Model (LDM)

2.1 Train the Diffusion Model

Run:

bash scripts/training/diffusion/training_diffusion.sh

By default this trains a DiT-B model. To select a different model size, set DIFFUSION_MODEL to one of dit_s_model, dit_b_model, dit_l_model, or dit_xl_model.

The checkpoint will be saved under:

$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/training_dit_b_model/$LVAE_TIMESTAMP/

✨ Inference

Stage 1: LVAE Reconstruction

Reconstruct Samples

Run:

bash scripts/training/autoencoder/samples_caching.sh

Reconstructed samples will be saved under:

$PRITTI_EXP_ROOT/exp/training_lvae_model/samples_caching/$LVAE_TIMESTAMP/samples_cache/

(Optional) Visualize Reconstructed Samples

Run:

bash scripts/training/autoencoder/samples_visualization.sh

Visualizations will be saved under:

$PRITTI_EXP_ROOT/exp/training_lvae_model/samples_visualization/$LVAE_TIMESTAMP/visualizations/

Stage 2: LDM Generation

Generate Samples

Run:

bash scripts/training/diffusion/samples_caching.sh

Generated samples will be saved under:

$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/samples_caching/$LVAE_TIMESTAMP/samples_cache/

(Optional) Visualize Generated Samples

Run:

bash scripts/training/diffusion/samples_visualization.sh

Visualizations will be saved under:

$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/samples_visualization/$LVAE_TIMESTAMP/visualizations/

📊 Evaluation

We report two families of metrics: reconstruction metrics for the LVAE, and generation metrics for the LDM.

Reconstruction Metrics

Compute LVAE reconstruction metrics

Run:

bash scripts/metrics/reconstruction/pritti_metrics.sh

This loads the LVAE checkpoint, reconstructs the validation split, and reports separate metrics for the ground raster map and the primitives (the latter using Omni3D).

Outputs:

$PRITTI_EXP_ROOT/exp/training_lvae_model/reconstruction_metrics/$LVAE_TIMESTAMP/
├── raster_results.csv      # ground raster map reconstruction (per-sample + mean row)
└── vector_results.csv      # primitives reconstruction (per-sample + mean row)

Generation Metrics

PrITTI's generation metrics are computed on top-down 2D semantic maps rendered from both reference and generated 3D scenes. The pipeline runs in six steps:

1. Render reference semantic maps

Renders the reference scenes into top-down semantic maps:

bash scripts/preprocessing/semantic_maps_rendering.sh

The rendered reference maps are written under $PRITTI_CACHE_ROOT/semantic_cache/.

Note: Open3D's offscreen renderer occasionally yields all-black or all-white images for otherwise valid scenes; these are detected and skipped without writing the .gz file. You may therefore need to re-run this script several times to render the full reference set. Already-rendered samples are detected and skipped automatically.

2. Select reference scenes via farthest-point sampling

Farthest-point sampling selects NUM_SAMPLES reference scenes (default 1000) that are maximally spaced apart, used to build the evaluation batch reported in the main paper:

bash scripts/preprocessing/reference_sampling_fps.sh

This writes the selected subset to $PRITTI_CACHE_ROOT/semantic_cache/reference_batch_fps_<NUM_SAMPLES>.txt.

Supplementary: we additionally report an ablation using distance-based reference sampling, where reference scenes are filtered by a minimum spatial distance threshold DISTANCE_THRESHOLD (default 10m). Run bash scripts/preprocessing/reference_sampling_distance.sh to reproduce this alternative; it writes reference_batch_distance_<DISTANCE_THRESHOLD>m.txt to the same directory.

3. Generate 50K samples without classifier-free guidance

For generation metrics, we generate 50K samples with classifier-free guidance disabled (guidance_scale=1.0). Edit scripts/training/diffusion/samples_caching.sh to reflect this:

# In scripts/training/diffusion/samples_caching.sh
GENERATED_SAMPLES_CACHE_SIZE=50000
GUIDANCE_SCALE=1.0

Then run:

bash scripts/training/diffusion/samples_caching.sh

Generated samples are saved under $PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/samples_caching/$LVAE_TIMESTAMP/samples_cache/.

4. Render generated semantic maps

Renders the 50K generated scenes from step 3 into top-down semantic maps:

bash scripts/training/diffusion/semantic_maps_rendering.sh

The rendered maps are saved under $PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/semantic_maps_rendering/$LVAE_TIMESTAMP/semantic_cache/.

Note: Open3D's offscreen renderer occasionally yields all-black or all-white images for otherwise valid scenes; these are detected and skipped without writing the .gz file. You may therefore need to re-run this script several times to render the full set of generated samples. Already-rendered samples are detected and skipped automatically.

5. Build paired evaluation batches

Pairs the FPS-selected reference subset (from step 2) with an equal number of randomly sampled generated maps, and saves them as .npz arrays for downstream metric computation:

bash scripts/metrics/generation/generation_metrics_fps.sh

Outputs:

$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/generation_metrics/$LVAE_TIMESTAMP/
├── ref_batch_fps_<NUM_SAMPLES>.npz
└── samples_batch_fps_<NUM_SAMPLES>.npz

Supplementary: to reproduce the distance-based ablation, run bash scripts/metrics/generation/generation_metrics_distance.sh. It adds ref_batch_distance_<DISTANCE_THRESHOLD>m.npz and samples_batch_distance_<DISTANCE_THRESHOLD>m.npz to the same output directory.

6. Compute generation metrics

The evaluator is adapted from OpenAI's guided-diffusion and runs in a separate conda environment to avoid TensorFlow/PyTorch dependency conflicts with the main pritti env.

Set up the metrics environment (one-time):

conda create -n metrics python=3.9
conda activate metrics
pip install 'tensorflow[and-cuda]'
pip install tqdm scipy requests

Run the evaluator:

conda activate metrics
bash scripts/metrics/generation/evaluator_fps.sh

Inception Score, FID, sFID, Precision, and Recall are printed to stdout.

Supplementary: to evaluate the distance-based ablation, run:
bash scripts/metrics/generation/evaluator_distance.sh

🌟 Citation

If you find PrITTI useful, please consider giving us a star and citing our paper:

@inproceedings{Tze2026PrITTI,
    author    = {Tze, Christina Ourania and Dauner, Daniel and Liao, Yiyi and Tsishkou, Dzmitry and Geiger, Andreas},
    title     = {PrITTI: Primitive-based Generation of Controllable and Editable 3D Semantic Scenes},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2026},
}

PrITTI: Primitive-based Generation of Controllable and Editable 3D Semantic Urban Scenes

Paper | Project Page

PrITTI: Primitive-based Generation of
Controllable and Editable 3D Semantic Urban Scenes