README.md
March 14, 2026 Β· View on GitHub
πSRA
Self-Representation Alignment for Diffusion Transformers
[project page]β[paper]

π₯1.News
- [2026.02.23] SRA 2 was accepted by CVPR 2026! No external encoder like REPA, also no external foward pass like SRA. Achieving convergence acceleration at an extremely low cost. See more here.
- [2026.01.26] SRA was accepted by ICLR 2026!
- [2025.07.11] We updated the PCA visualization code in our paper!
- [2025.06.14] We updated the results and checkpoint of SiT+SRA on ImageNet 512x512!
- [2025.05.06] We have released the paper and code of SRA!
π2.Highlight
-
Diffusion transformer itself to provide representation guidance: We assume the unique discriminative process of diffusion transformer makes it possible to provide the guidance without introducing extraneous representation component.
-
Self-Representation Alignment (SRA): SRA aligns the output latent representation of the diffusion transformers in earlier layer with higher noise to that in later layer with lower noise to achieve self-representation alignment.
-
Improved Performance. SRA accelerates training and improves generation performance for both DiTs and SiTs.
π‘3.Environment Setup
conda create -n sra python=3.12 -y
conda activate sra
pip install -r requirements.txt
π4.Dataset Preparation
Currently, we provide experiments for ImageNet. You can place the data that you want and can specify it via --data-dir arguments in training scripts.
Note that we preprocess the data for faster training. Please refer to preprocessing guide for detailed guidance.
π₯5.Training
Here we provide the training code for SiTs and DiTs.
5.1.Training with SiT + SRA
cd SiT-SRA
accelerate launch --config_file configs/default.yaml train.py \
--mixed-precision="fp16" \
--seed=0 \
--path-type="linear" \
--prediction="v" \
--resolution=256 \
--batch-size=32 \
--weighting="uniform" \
--model="SiT-XL/2" \
--block-out-s=8 \
--block-out-t=20 \
--t-max=0.2 \
--output-dir="exps" \
--exp-name="sitxl-ab820-t0.2-res256" \
--data-dir=[YOUR_DATA_PATH]
Then this script will automatically create the folder in exps to save logs,samples, and checkpoints. You can adjust the following options:
--models: Choosing from [SiT-B/2, SiT-L/2, SiT-XL/2]--block-out-s: Student's output block layer for alignment--block-out-t: Teacher's output block layer for alignment--t-max: Maximum time interval for alignment (we only use dynamic interval here)--output-dir: Any directory that you want to save checkpoints, samples, and logs--exp-name: Any string name (the folder will be created underoutput-dir)--batch-size: The local batch size (by default we use 1 node of 8 GPUs), you need to adjust this value according to your GPU number to make total batch size of 256
5.2.Training with DiT + SRA
cd DiT-SRA
accelerate launch --config_file configs/default.yaml train.py \
--mixed-precision="fp16" \
--seed=0 \
--resolution=256 \
--batch-size=32 \
--model="DiT-XL/2" \
--block-out-s=8 \
--block-out-t=16 \
--t-max=0.2 \
--output-dir="exps" \
--exp-name="ditxl-ab816-t0.2-res256" \
--data-dir=[YOUR_DATA_PATH]
Then this script will automatically create the folder in exps to save logs and checkpoints. You can adjust the following options (others are same as above SiTs):
--models: Choosing from [DiT-B/2, DiT-L/2, DiT-XL/2]
π 6.Evaluation
Here we provide the generating code for SiTs and DiTs to get the samples for evaluation. (and the .npz file can be used for ADM evaluation suite) through the following script:
6.1.Sampling with SiT + SRA
You can download our pretrained model here:
| Model | Image Resolution | Epochs | FID-50K | Inception Score |
|---|---|---|---|---|
| SiT-XL/2 + SRA | 512x512 | 400 | 2.07 | 302.2 |
| SiT-XL/2 + SRA | 256x256 | 800 | 1.58 | 311.4 |
cd SiT-SRA
bash gen.sh
Note that there are several options in gen.sh file that you need to complete:
SAMPLE_DIR: Base directory to save the generated images and .npz fileCKPT: Checkpoint path (This can also be your downloaded local file of the ckpt file we provide above)
And for ImageNet 512x512 with CFG, we use the guidance scale of 2.5 with the guidance interval, which is a little bit different from hyperparameters used in ImageNet 256x256.
6.2.Sampling with DiT + SRA
cd DiT-SRA
bash gen.sh
π¬7.PCA Visualization
We provide the PCA vis code of SiTs (256x256) to help to get the similar visualization results as shown in our paper.
cd pca-vis
python main_pca.py \
--ckpt=[YOUR_CKPT_PATH] \
--baseline=False \
You need to complete the following options (others in main_pca.py can also be changed):
--ckpt: Checkpoint path (This can also be your downloaded local file of the ckpt file we provide above)--baseline: Whether to use baseline, set it to 'False' if you do not use the ckpt file provided in SiT repo
π£8.Note
It's possible that this code may not accurately replicate the results outlined in the paper due to potential human errors during the preparation and cleaning of the code for release as well as the difference of the hardware facility. If you encounter any difficulties in reproducing our findings, please don't hesitate to inform us.
π€π»9.Acknowledgement
This code is mainly built upon REPA, DiT, SiT repositories. Thanks for their solid work!
πΊ10.Citation
If you find SRA useful, please kindly cite our paper:
@article{jiang2025sra,
title={No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves},
author={Jiang, Dengyang and Wang, Mengmeng and Li, Liuzhuozheng and Zhang, Lei and Wang, Haoyu and Wei, Wei and Dai, Guang and Zhang, Yanning and Wang, Jingdong},
journal={arXiv preprint arXiv:2505.02831},
year={2025}
}