DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models 🎨

May 20, 2025 · View on GitHub

🔥 News

2025-03-17: Our paper DreamRenderer is now available on arXiv and Supplementary Material is released.
2025-03-20: We release the code! 🎉
2025-05-20: We have released the code for integrating DreamRenderer with SD3.

Multi-Instance Attribute Control

DreamRenderer is a training-free method built upon the FLUX model that enables users to precisely control the content of each instance through bounding boxes or masks while ensuring overall visual harmony.

✅ To-Do List

🛠️ Installation

🚀 Checkpoints

Download the checkpoint for SAM2, sam2_hiera_large.pt, and place it in the pretrained_weights directory as shown below:

├── pretrained_weights
│   ├── sam2_hiera_large.pt
├── DreamRenderer
│   ├── ...
├── scripts
│   ├── ...

💻 Environment Setup

# Create and activate conda environment
conda create -n dreamrenderer python=3.10 -y
conda activate dreamrenderer

# Install dependencies
pip install -r requirements.txt
pip install -e .

# Install segment-anything-2
cd segment-anything-2
pip install -e . --no-deps
cd ..

🧩 Region/Instance Controllable Rendering

You can quickly use DreamRenderer for precise rendering with the following commands:

python scripts/inference_demo0.py --use_sam_enhance

Demo 0 Output

python scripts/inference_demo1.py --use_sam_enhance

Demo 1 Output

python scripts/inference_demo2.py --num_hard_control_steps=15

Demo 2 Output

🔌 Support for ControlNet (rough implementation version)

In the original paper, we used FLUX-depth and FLUX-canny for image-conditioned generation. Now, we also provide a script that supports image-conditioned generation via ControlNet:

python scripts/inferenceCN_demo0.py --res=768

ControlNet Demo Output

🔌 Support for SD3 (rough implementation version)

To further demonstrate the generalizability of our method, we integrated DreamRenderer with another DiT-based architecture, SD3. We use ControlNet to guide generation based on depth:

python scripts/inference_demo5.py  --use_sam_enhance

example

🖼️ End-to-End Layout-to-Image Generation

DreamRenderer supports re-rendering outputs from state-of-the-art Layout-to-Image models, enhancing image quality and allowing for fine-grained control over each instance in the layout.

Here's how it works:

A Layout-to-Image method first generates a coarse image based on the input layout.
We extract a depth map from this image.
DreamRenderer then re-renders the scene, guided by the original layout, to produce a higher-quality and more faithful result.

📦 1. Install Depth Map Extraction (Depth-Anything v2)

We use Depth-Anything v2 for extracting depth maps. To enable this feature, follow these steps:

Step 1: Install the Depth-Anything package

cd Depth-Anything-V2
pip install -e .
cd ..

Step 2: Download Model Weights

Download the Depth-Anything v2 model (depth_anything_v2_vitl.pth) and place it in the pretrained_weights directory:

├── pretrained_weights
│   ├── depth_anything_v2_vitl.pth
├── DreamRenderer
│   ├── ...
├── scripts
│   ├── ...

🚀 2. Run End-to-End Generation

Once everything is set up, you can run the following commands to achieve end-to-end layout-to-image generation.

End-to-end layout-to-image generation with MIGC (download MIGC_SD14.ckpt and put it in pretrained_weights):

python scripts/inference_demo3.py --res=768 --use_sam_enhance --num_hard_control_steps=15

MIGC + DreamRenderer Output

End-to-end layout-to-image generation with InstanceDiffusion (download instancediffusion_sd15.pth and put it in pretrained_weights):

python scripts/inference_demo4.py --use_sam_enhance --num_hard_control_steps=10 --res=768

InstanceDiffusion + DreamRenderer Output

We will soon integrate with more SOTA layout-to-image methods. Stay tuned!

@misc{zhou2025dreamrenderer,
      title={DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models},
      author={Dewei Zhou and Mingwei Li and Zongxin Yang and Yi Yang},
      year={2025},
      eprint={2503.12885},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.12885},
}

📬 Contact

If you have any questions or suggestions, please feel free to contact us 😆!