| MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model | 31 Aug 2022 | TPAMI'2024 |  |  |
| All are Worth Words: A ViT Backbone for Diffusion Models | 25 Sep 2022 | CVPR'2023 |  |  |
| Learning to Learn with Generative Models of Neural Network Checkpoints | 26 Sep 2022 | arXiv |  |  |
| Scalable Diffusion Models with Transformers | 19 Dec 2022 | ICCV'2023 |  |  |
| Exploring Vision Transformers as Diffusion Learners | 28 Dec 2022 | arXiv |  | |
| DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer | 07 Mar 2023 | ICCV'2023 |  |  |
| Masked Diffusion Transformer is a Strong Image Synthesizer | 25 Mar 2023 | ICCV'2023 |  |  |
| Diffusion Transformer for Adaptive Text-to-Speech | 03 May 2023 | Interspeech'2023 |  |  |
| VDT: General-purpose Video Diffusion Transformers via Mask Modeling | 22 May 2023 | ICLR'2024 |  |  |
| ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer | 22 May 2023 | EMNLP'2023 |  |  |
| U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech | 22 May 2023 | arXiv |  |  |
| Fast Training of Diffusion Models with Masked Transformers | 15 Jun 2023 | TMLR |  |  |
| DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation | 04 Jul 2023 | NeurIPS'2023 |  |  |
| Large-Vocabulary 3D Diffusion Model with Transformer | 14 Sep 2023 | ICLR'2024 |  |  |
| Cartoondiff: Training-free Cartoon Image Generation with Diffusion Transformer Models | 15 Sep 2023 | arXiv |  |  |
| PixArt-ฮฑ: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis | 30 Sep 2023 | ICLR'2024 |  |  |
| Dolfin: Diffusion Layout Transformers without Autoencoder | 25 Oct 2023 | arXiv |  | |
| Mapache: Masked parallel transformer for advanced speech editing and synthesis | 03 Dec 2023 | ICASSP'2024 |  | |
| DiffiT: Diffusion Vision Transformers for Image Generation | 04 Dec 2023 | arXiv |  |  |
| GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation | 07 Dec 2023 | CVPR'2024 |  |  |
| Photorealistic Video Generation with Diffusion Models | 11 Dec 2023 | arXiv |  |  |
| DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers | 11 Dec 2023 | arXiv |  | |
| Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation | 12 Dec 2023 | arXiv |  |  |
| NViST: In the Wild New View Synthesis from a Single Image with Transformers | 13 Dec 2023 | arXiv |  |  |
| TransDDPM: Transformer-Based Denoising Diffusion Probabilistic Model for Image Restoration | 28 Dec 2023 | PRCV'2023 |  | |
| Latte: Latent Diffusion Transformer for Video Generation | 05 Jan 2024 | arXiv |  |  |
| PIXART-ฮด: Fast and Controllable Image Generation with Latent Consistency Models | 10 Jan 2024 | arXiv |  |  |
| SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers | 16 Jan 2024 | arXiv |  |  |
| Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers | 21 Jan 2024 | arXiv |  |  |
| Cross-view Masked Diffusion Transformers for Person Image Synthesis | 02 Feb 2024 | arXiv |  | |
| DiffsFormer: A Diffusion Transformer on Stock Factor Augmentation | 05 Feb 2024 | arXiv |  | |
| Sora | 15 Feb 2024 | OpenAI |  |  |
| SDiT: Spiking Diffusion Model with Transformer | 18 Feb 2024 | arXiv |  | |
| FiT: Flexible Vision Transformer for Diffusion Model | 19 Feb 2024 | arXiv |  |  |
| Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis | 22 Feb 2024 | arXiv |  |  |
| OpenDiT | 26 Feb 2024 | GitHub |  |  |
| FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes | 28 Feb 2024 | arXiv |  |  |
| Open-Sora-Plan | 01 Mar 2024 | GitHub |  |  |
| Stable Diffusion 3: Research Paper | 05 Mar 2024 | Stability AI |  |  |