LDM (DiT-B) checkpoint
May 24, 2026 · View on GitHub
PrITTI: Primitive-based Generation of
Controllable and Editable 3D Semantic Urban Scenes
Christina Ourania Tze · Daniel Dauner · Yiyi Liao · Dzmitry Tsishkou · Andreas Geiger
Paper | Project Page
We introduce PrITTI, a latent diffusion-based framework that leverages primitives as the main foundational elements for generating compositional, controllable, and editable 3D semantic scene layouts. Our approach enables applications such as scene editing, inpainting, outpainting, and photo-realistic street view synthesis.
This is the official repository for PrITTI.
News
- [May 2026] Pre-trained models, training, inference & evaluation code released.
- [Feb 2026] PrITTI is accepted to CVPR 2026, see you in Denver!
🛠️ Installation
1. Environment Variables
Add the following to your ~/.bashrc:
export PRITTI_WORKSPACE="$HOME/pritti_workspace"
export KITTI360_DATASET="$HOME/pritti_workspace/dataset"
export PRITTI_EXP_ROOT="$HOME/pritti_workspace/exp"
export PRITTI_CACHE_ROOT="$HOME/pritti_workspace/cache"
export PRITTI_DEVKIT_ROOT="$HOME/pritti_workspace/pritti"
Then reload your shell:
source ~/.bashrc
2. Clone the Repository
mkdir $PRITTI_WORKSPACE
cd $PRITTI_WORKSPACE
git clone https://github.com/autonomousvision/pritti.git && cd pritti
3. Create the Conda Environment
conda env create --name pritti -f environment.yaml
conda activate pritti
pip install -r requirements.txt
pip install git+https://github.com/raniatze/kitti360Scripts.git
wget https://anaconda.org/pytorch3d/pytorch3d/0.7.8/download/linux-64/pytorch3d-0.7.8-py39_cu121_pyt241.tar.bz2 -O pytorch3d.tar.bz2
conda install pytorch3d.tar.bz2
rm pytorch3d.tar.bz2
pip install -e .
4. Download the Dataset
mkdir $KITTI360_DATASET && cd $KITTI360_DATASET
gdown https://drive.google.com/uc?id=1_yIKHQZj1E1V2jogsDHAEsolKOylpA8U
unzip dataset.zip && mv dataset/* . && rmdir dataset && rm dataset.zip
5. Process the Dataset
Run the following in order:
cd $PRITTI_DEVKIT_ROOT
bash scripts/preprocessing/preprocessing.sh
bash scripts/preprocessing/samples_labeling.sh
bash scripts/preprocessing/dataset_statistics.sh
6. (Optional) Visualize Dataset Samples
To visually verify the preprocessed dataset, you can render a random subset of ground truth samples:
bash scripts/preprocessing/samples_visualization.sh
Screenshots will be saved under $PRITTI_EXP_ROOT/exp/preprocessing/samples_visualization/<TIMESTAMP>/visualizations/, where <TIMESTAMP> is the script's launch time, formatted as YYYY.MM.DD.HH.MM.SS.
📦 Pre-trained Models
We release the pre-trained LVAE and LDM (DiT-B) checkpoints on Hugging Face: raniatze/pritti-checkpoints. If you want to skip training and jump straight to Inference, follow the steps below.
1. Set the Checkpoint Identifiers
Add the following to your ~/.bashrc (these match the released checkpoints) and reload your shell:
export LVAE_TIMESTAMP="2025.06.03.17.23.30"
export LVAE_EPOCH="299"
export LVAE_STEP="580200"
source ~/.bashrc
2. Download the Checkpoints
# LVAE checkpoint
LVAE_DIR=$PRITTI_EXP_ROOT/exp/training_lvae_model/training_lvae_model/$LVAE_TIMESTAMP/checkpoints
mkdir -p $LVAE_DIR
huggingface-cli download raniatze/pritti-checkpoints lvae.ckpt --local-dir $LVAE_DIR
mv $LVAE_DIR/lvae.ckpt $LVAE_DIR/epoch=$LVAE_EPOCH-step=$LVAE_STEP.ckpt
# LDM (DiT-B) checkpoint
LDM_DIR=$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/training_dit_b_model/$LVAE_TIMESTAMP
mkdir -p $LDM_DIR
huggingface-cli download raniatze/pritti-checkpoints --include "ldm_b/*" --local-dir $LDM_DIR
mv $LDM_DIR/ldm_b $LDM_DIR/checkpoint
You can now skip directly to the Inference section.
🏋️ Training
PrITTI is trained in two stages: a Layout Variational Autoencoder followed by a Latent Diffusion Model.
Stage 1: Layout Variational Autoencoder (LVAE)
Train the LVAE
Run:bash scripts/training/autoencoder/training_autoencoder.sh
The trained checkpoint will be saved under:
$PRITTI_EXP_ROOT/exp/training_lvae_model/training_lvae_model/<TIMESTAMP>/checkpoints/
Once training is complete, update your ~/.bashrc to match your trained checkpoint and reload:
export LVAE_TIMESTAMP="<TIMESTAMP>" # name of the training run folder (format: YYYY.MM.DD.HH.MM.SS)
export LVAE_EPOCH="<EPOCH>" # from the checkpoint filename: epoch=<EPOCH>-step=<STEP>.ckpt
export LVAE_STEP="<STEP>" # from the checkpoint filename: epoch=<EPOCH>-step=<STEP>.ckpt
source ~/.bashrc
Cache LVAE Latents
Once LVAE_TIMESTAMP, LVAE_EPOCH, and LVAE_STEP are set in your environment, run the following command to cache the latent representations used for training the diffusion model:
bash scripts/training/autoencoder/latent_caching.sh
Stage 2: Latent Diffusion Model (LDM)
2.1 Train the Diffusion Model
Run:
bash scripts/training/diffusion/training_diffusion.sh
By default this trains a DiT-B model. To select a different model size, set DIFFUSION_MODEL to one of dit_s_model, dit_b_model, dit_l_model, or dit_xl_model.
The checkpoint will be saved under:
$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/training_dit_b_model/$LVAE_TIMESTAMP/
✨ Inference
Stage 1: LVAE Reconstruction
Reconstruct Samples
Run:
bash scripts/training/autoencoder/samples_caching.sh
Reconstructed samples will be saved under:
$PRITTI_EXP_ROOT/exp/training_lvae_model/samples_caching/$LVAE_TIMESTAMP/samples_cache/
(Optional) Visualize Reconstructed Samples
Run:
bash scripts/training/autoencoder/samples_visualization.sh
Visualizations will be saved under:
$PRITTI_EXP_ROOT/exp/training_lvae_model/samples_visualization/$LVAE_TIMESTAMP/visualizations/
Stage 2: LDM Generation
Generate Samples
Run:
bash scripts/training/diffusion/samples_caching.sh
Generated samples will be saved under:
$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/samples_caching/$LVAE_TIMESTAMP/samples_cache/
(Optional) Visualize Generated Samples
Run:
bash scripts/training/diffusion/samples_visualization.sh
Visualizations will be saved under:
$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/samples_visualization/$LVAE_TIMESTAMP/visualizations/
📊 Evaluation
We report two families of metrics: reconstruction metrics for the LVAE, and generation metrics for the LDM.
Reconstruction Metrics
Compute LVAE reconstruction metrics
Run:
bash scripts/metrics/reconstruction/pritti_metrics.sh
This loads the LVAE checkpoint, reconstructs the validation split, and reports separate metrics for the ground raster map and the primitives (the latter using Omni3D).
Outputs:
$PRITTI_EXP_ROOT/exp/training_lvae_model/reconstruction_metrics/$LVAE_TIMESTAMP/
├── raster_results.csv # ground raster map reconstruction (per-sample + mean row)
└── vector_results.csv # primitives reconstruction (per-sample + mean row)
Generation Metrics
PrITTI's generation metrics are computed on top-down 2D semantic maps rendered from both reference and generated 3D scenes. The pipeline runs in six steps:
1. Render reference semantic maps
Renders the reference scenes into top-down semantic maps:
bash scripts/preprocessing/semantic_maps_rendering.sh
The rendered reference maps are written under $PRITTI_CACHE_ROOT/semantic_cache/.
Note: Open3D's offscreen renderer occasionally yields all-black or all-white images for otherwise valid scenes; these are detected and skipped without writing the
.gzfile. You may therefore need to re-run this script several times to render the full reference set. Already-rendered samples are detected and skipped automatically.
2. Select reference scenes via farthest-point sampling
Farthest-point sampling selects NUM_SAMPLES reference scenes (default 1000) that are maximally spaced apart, used to build the evaluation batch reported in the main paper:
bash scripts/preprocessing/reference_sampling_fps.sh
This writes the selected subset to $PRITTI_CACHE_ROOT/semantic_cache/reference_batch_fps_<NUM_SAMPLES>.txt.
Supplementary: we additionally report an ablation using distance-based reference sampling, where reference scenes are filtered by a minimum spatial distance threshold
DISTANCE_THRESHOLD(default10m). Runbash scripts/preprocessing/reference_sampling_distance.shto reproduce this alternative; it writesreference_batch_distance_<DISTANCE_THRESHOLD>m.txtto the same directory.
3. Generate 50K samples without classifier-free guidance
For generation metrics, we generate 50K samples with classifier-free guidance disabled (guidance_scale=1.0). Edit scripts/training/diffusion/samples_caching.sh to reflect this:
# In scripts/training/diffusion/samples_caching.sh
GENERATED_SAMPLES_CACHE_SIZE=50000
GUIDANCE_SCALE=1.0
Then run:
bash scripts/training/diffusion/samples_caching.sh
Generated samples are saved under $PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/samples_caching/$LVAE_TIMESTAMP/samples_cache/.
4. Render generated semantic maps
Renders the 50K generated scenes from step 3 into top-down semantic maps:
bash scripts/training/diffusion/semantic_maps_rendering.sh
The rendered maps are saved under $PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/semantic_maps_rendering/$LVAE_TIMESTAMP/semantic_cache/.
Note: Open3D's offscreen renderer occasionally yields all-black or all-white images for otherwise valid scenes; these are detected and skipped without writing the
.gzfile. You may therefore need to re-run this script several times to render the full set of generated samples. Already-rendered samples are detected and skipped automatically.
5. Build paired evaluation batches
Pairs the FPS-selected reference subset (from step 2) with an equal number of randomly sampled generated maps, and saves them as .npz arrays for downstream metric computation:
bash scripts/metrics/generation/generation_metrics_fps.sh
Outputs:
$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/generation_metrics/$LVAE_TIMESTAMP/
├── ref_batch_fps_<NUM_SAMPLES>.npz
└── samples_batch_fps_<NUM_SAMPLES>.npz
Supplementary: to reproduce the distance-based ablation, run
bash scripts/metrics/generation/generation_metrics_distance.sh. It addsref_batch_distance_<DISTANCE_THRESHOLD>m.npzandsamples_batch_distance_<DISTANCE_THRESHOLD>m.npzto the same output directory.
6. Compute generation metrics
The evaluator is adapted from OpenAI's guided-diffusion and runs in a separate conda environment to avoid TensorFlow/PyTorch dependency conflicts with the main pritti env.
Set up the metrics environment (one-time):
conda create -n metrics python=3.9
conda activate metrics
pip install 'tensorflow[and-cuda]'
pip install tqdm scipy requests
Run the evaluator:
conda activate metrics
bash scripts/metrics/generation/evaluator_fps.sh
Inception Score, FID, sFID, Precision, and Recall are printed to stdout.
Supplementary: to evaluate the distance-based ablation, run:
bash scripts/metrics/generation/evaluator_distance.sh
🌟 Citation
If you find PrITTI useful, please consider giving us a star and citing our paper:
@inproceedings{Tze2026PrITTI,
author = {Tze, Christina Ourania and Dauner, Daniel and Liao, Yiyi and Tsishkou, Dzmitry and Geiger, Andreas},
title = {PrITTI: Primitive-based Generation of Controllable and Editable 3D Semantic Scenes},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026},
}