README.md
May 22, 2025 ยท View on GitHub
Preprocessing Guide
Dataset download
For ImageNet, we follow the preprocessing code in REPA, which is based on edm2. The range of input is [-1, 1]. We currently support 256x256 generation on ImageNet.
After downloading and unzipping ImageNet, please run the following scripts:
# Convert raw ImageNet data to a ZIP archive at 256x256 resolution
python dataset_tools.py convert --source=[YOUR_DOWNLOAD_PATH]/ILSVRC/Data/CLS-LOC/train \
--dest=[TARGET_PATH]/images --resolution=256x256 --transform=center-crop-dhariwal
# Convert the pixel data to VAE latents
python dataset_tools.py encode --source=[TARGET_PATH]/images \
--dest=[TARGET_PATH]/vae-sd
For MS-COCO, we leverage the preprocessing scripts in U-ViT.
First, we download MS-COCO dataset to /U-ViT/assets/datasets/coco/.
Then we put our extract_mscoco_feature.py and extract_empty_feature.py at U-ViT/.
Acknowledgement
This code is mainly built upon REPA, edm2, and U-ViT repository.