Curated Reading List
April 28, 2026 ยท View on GitHub
The full bibliography is in references/citation.bib. This page provides a lightweight entry point organized by the roadmap.
Foundations
- GANs: Goodfellow et al.; DCGAN; WGAN / WGAN-GP; StyleGAN; BigGAN.
- Diffusion and score models: DDPM; DDIM; Score-based SDE; Latent Diffusion / Stable Diffusion.
- Flow matching and rectified flow: Rectified Flow; Flow Matching; Stable Diffusion 3 / MM-DiT; Diff2Flow.
- Autoregressive visual generation: LlamaGen; VAR; Chameleon; Emu3; Janus / Janus-Pro.
- Hybrid systems: Transfusion; MonoFormer; Show-o / Show-o2; JanusFlow; BLIP3o-NEXT; MAR; NextStep-1.
Control and In-Context Generation
- Structural control: ControlNet and ControlNet++; OminiControl; EasyControl; RichControl; OmniRefiner.
- Layout and relational control: ReCon; CreatiLayout; HybridLayout; MIGLoRA; MOSAIC.
- Identity and personalization: Textual Inversion; DreamBooth; LoRA; IP-Adapter; InstantID; PuLID; PhotoMaker; StoryMaker.
- Multi-view / 3D consistency: MV-Adapter; FlexGen; PRISM; SpinMeRound; SynCD.
Unified Understanding and Generation
- Unified multimodal systems: Chameleon; Emu3; Janus-Pro; BAGEL; BLIP3o-NEXT; X-Omni; UAE; HunyuanImage 3.0.
- Architectural motifs: shared token streams, decoupled visual encoders, MMDiT-style fusion, AR planner plus diffusion/flow renderer.
Training, Data, and Alignment
- Industrial training recipes: Qwen-Image; Z-Image; Seedream 3.0/4.0; HunyuanImage 3.0; LongCat-Image / LongCat-Next; FireRed-Image-Edit; JoyAI-Image.
- Data construction: LAION; COYO; DataComp; UltraEdit; AnyEdit; ImgEdit; EditWorld; Pico-Banana-400K; ShareGPT-4o-Image; OpenGPT-4o-Image.
- Preference and reward: DDPO; DPOK; AlignProp; Diffusion-DPO; DanceGRPO; RewardDance; EditReward.
- Evaluation and VLM judges: VBench; EvalCrafter; VideoScore2; VideoEval-Pro; VLM-as-Judge and LLM-as-Judge style protocols.
Editing and Applications
- Instruction editing: InstructPix2Pix-style editing; Step1X-Edit; SEED-Data-Edit; REALEDIT; PSR.
- Reasoning-driven editing: ReasonEdit; LegoEdit; ImageEA; MIRA / MIRAMI; X-Planner.
- Typography and design: GlyphByT5; EasyText; UniGlyph; ReChar; PosterCraft; PosterVerse; TextAtlas.
- Data-centric visualization: NL2VIS; DataVisT5; ST-Raptor; MoDora; FDABench.
Agentic Generation and World Models
- Visual Chain-of-Thought: ReasonGen-R1; T2I-R1; ReCon; CreatiLayout.
- Tool-augmented generation: GEMS; Gen-Searcher; JarvisArt.
- World models and playable simulation: World Models; Dreamer; JEPA; Genie / Genie 2; GameNGen; DIAMOND; Oasis; GameGen-X.
- Embodied simulation: UniSim; Cosmos Predict; GAIA-1; VPP; CoT-VLA; UniPi; UWM; VideoVLA.
This list is intentionally selective. For complete metadata, use the BibTeX file.