README.md

December 17, 2024 ยท View on GitHub

[NeurIPS24] FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution

caps

[NEWS] [9.26] ๐Ÿ’๐Ÿ’ Our FlowDCN is accepted by NeurIPS 2024! ๐Ÿ’๐Ÿ’

[NEWS] [11.22] ๐Ÿบ Our FlowDCN models and code are now available in the official repo!

Pretrained Models

Our Models consistently achieve state-of-the-art results on the sFID metrics compared to SiT/DiT.

Metrics

Our Models consistently has fewer parameters and GFLOPS compared to Transformer counterparts. Our code also support LogNorm and VAR(Various Aspect Ratio Training)

Model-itersResolutionSolverNFE-CFGFIDsFIDParamsLink
FlowDCN-S-400k256x256EulerSDE-250250x254.68.830.3MHF
FlowDCN-B-400k256x256EulerSDE-250250x228.56.09120MHF
VAR-FlowDCN-B-400k256x256EulerSDE-250250x223.67.72120MHF
FlowDCN-L-400k256x256EulerSDE-250250x213.84.69421MHF
FlowDCN-XL-2M256x256EulerODE-250250x22.014.33618MHF
FlowDCN-XL-2M256x256EulerSDE-250250x22.004.37618MHF
FlowDCN-XL-2M256x256NeuralSolver-1010x22.355.07618MHF
FlowDCN-XL-100k512x512EulerODE-5050x22.765.29618MHF
FlowDCN-XL-100k512x512EulerSDE-250250x22.444.53618MHF
FlowDCN-XL-100k512x512NeuralSolver-1010x22.774.68618MHF

Usages

remember download models and change the VAE and pretrained models path

For training

python3 main.py fit -c configs/CONFIG

For sampling

python3 main.py predict -c configs/CONFIG

Visualizations

CFG1.375 Generation Images:

ModelsResolutionLink
FlowDCN-XL-100k512x512HF
FlowDCN-XL-2M256x256HF

CFG4.0 selected Generation Images:

caps

Various Resolution Extension

Models256x256 FIDsFIDIS320x320 FIDsFIDIS224x448 FIDsFIDIS160x480 FIDsFIDIS
DiT-B44.838.4932.0595.47108.6818.38109.1110.7114.00143.8122.818.93
with EI44.838.4932.0581.4862.2520.97133.272.5311.11160.493.917.30
with PI44.838.4932.0572.4754.0224.15133.470.2911.73156.593.807.80
FiT-B (+VAR)36.3611.0840.6961.3530.7131.0144.6724.0937.156.8122.0725.25
with VisionYaRN36.3611.0840.6944.7638.0444.7041.9242.7945.8762.8444.8227.84
with VisionNTK36.3611.0840.6957.3131.3133.9743.8426.2539.2256.7624.1826.40
FlowDCN-B28.56.095134.427.252.271.762.023.72111115.83
FlowDCN-B (+VAR)23.67.7262.829.115.869.531.417.062.444.717.835.8

Linear-Multi-step Solvers

We also provide a adams-like linear-multi-step solver for the recitified flow sampling. The related configs are named with adam2 or adam4. The solver code are placed in ./src/diffusion/flow_matching/adam_sampling.py.

Compared to Henu/RK4, the linear-multi-step solver is more stable and faster.

During some experiments, we supringly find that the linear-multi-step solver can achieve comparable results even with FlowTurbo.

As they are distinct methods, so armed with Adams, we believe FlowTurbo can be more powerful.

Also, We provide some magic solvers for the recitified flow sampling. These solvers are highly inspired by linear-multi-steps methods, and consists of just some Magic Numbers These solvers are really powerful and interesting. We place the related code in ./src/diffusion/flow_matching/ns_sampling.py.

SiT-XL-R256StepsNFE-CFGExtra-ParamtersFIDISPRRecall
Heun816x203.68///
Heun1122x202.79///
Heun1530x202.42///
Adam266x206.351900.750.55
Adam288x204.162120.780.56
Adam21616x202.422370.800.60
Adam41616x202.272430.800.60

Citation

@inproceedings{
wang2024exploring,
title={Exploring {DCN}-like architecture for fast image generation with arbitrary resolution},
author={Shuai Wang and Zexian Li and Tianhui Song and Xubin Li and Tiezheng Ge and Bo Zheng and Limin Wang},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=e57B7BfA2B}
}