Curated Reading List

April 28, 2026 ยท View on GitHub

The full bibliography is in references/citation.bib. This page provides a lightweight entry point organized by the roadmap.

Foundations

  • GANs: Goodfellow et al.; DCGAN; WGAN / WGAN-GP; StyleGAN; BigGAN.
  • Diffusion and score models: DDPM; DDIM; Score-based SDE; Latent Diffusion / Stable Diffusion.
  • Flow matching and rectified flow: Rectified Flow; Flow Matching; Stable Diffusion 3 / MM-DiT; Diff2Flow.
  • Autoregressive visual generation: LlamaGen; VAR; Chameleon; Emu3; Janus / Janus-Pro.
  • Hybrid systems: Transfusion; MonoFormer; Show-o / Show-o2; JanusFlow; BLIP3o-NEXT; MAR; NextStep-1.

Control and In-Context Generation

  • Structural control: ControlNet and ControlNet++; OminiControl; EasyControl; RichControl; OmniRefiner.
  • Layout and relational control: ReCon; CreatiLayout; HybridLayout; MIGLoRA; MOSAIC.
  • Identity and personalization: Textual Inversion; DreamBooth; LoRA; IP-Adapter; InstantID; PuLID; PhotoMaker; StoryMaker.
  • Multi-view / 3D consistency: MV-Adapter; FlexGen; PRISM; SpinMeRound; SynCD.

Unified Understanding and Generation

  • Unified multimodal systems: Chameleon; Emu3; Janus-Pro; BAGEL; BLIP3o-NEXT; X-Omni; UAE; HunyuanImage 3.0.
  • Architectural motifs: shared token streams, decoupled visual encoders, MMDiT-style fusion, AR planner plus diffusion/flow renderer.

Training, Data, and Alignment

  • Industrial training recipes: Qwen-Image; Z-Image; Seedream 3.0/4.0; HunyuanImage 3.0; LongCat-Image / LongCat-Next; FireRed-Image-Edit; JoyAI-Image.
  • Data construction: LAION; COYO; DataComp; UltraEdit; AnyEdit; ImgEdit; EditWorld; Pico-Banana-400K; ShareGPT-4o-Image; OpenGPT-4o-Image.
  • Preference and reward: DDPO; DPOK; AlignProp; Diffusion-DPO; DanceGRPO; RewardDance; EditReward.
  • Evaluation and VLM judges: VBench; EvalCrafter; VideoScore2; VideoEval-Pro; VLM-as-Judge and LLM-as-Judge style protocols.

Editing and Applications

  • Instruction editing: InstructPix2Pix-style editing; Step1X-Edit; SEED-Data-Edit; REALEDIT; PSR.
  • Reasoning-driven editing: ReasonEdit; LegoEdit; ImageEA; MIRA / MIRAMI; X-Planner.
  • Typography and design: GlyphByT5; EasyText; UniGlyph; ReChar; PosterCraft; PosterVerse; TextAtlas.
  • Data-centric visualization: NL2VIS; DataVisT5; ST-Raptor; MoDora; FDABench.

Agentic Generation and World Models

  • Visual Chain-of-Thought: ReasonGen-R1; T2I-R1; ReCon; CreatiLayout.
  • Tool-augmented generation: GEMS; Gen-Searcher; JarvisArt.
  • World models and playable simulation: World Models; Dreamer; JEPA; Genie / Genie 2; GameNGen; DIAMOND; Oasis; GameGen-X.
  • Embodied simulation: UniSim; Cosmos Predict; GAIA-1; VPP; CoT-VLA; UniPi; UWM; VideoVLA.

This list is intentionally selective. For complete metadata, use the BibTeX file.