DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models ๐จ
May 20, 2025 ยท View on GitHub
๐ฅ News
- 2025-03-17: Our paper DreamRenderer is now available on arXiv and Supplementary Material is released.
- 2025-03-20: We release the code! ๐
- 2025-05-20: We have released the code for integrating DreamRenderer with SD3.

๐ Introduction
DreamRenderer is a training-free method built upon the FLUX model that enables users to precisely control the content of each instance through bounding boxes or masks while ensuring overall visual harmony.
โ To-Do List
- Arxiv Paper & Supplementary Material
- Inference Code
- More Demos. Coming soon. stay tuned! ๐
- ComfyUI support
- Huggingface Space support
๐ ๏ธ Installation
๐ Checkpoints
Download the checkpoint for SAM2, sam2_hiera_large.pt, and place it in the pretrained_weights directory as shown below:
โโโ pretrained_weights
โ โโโ sam2_hiera_large.pt
โโโ DreamRenderer
โ โโโ ...
โโโ scripts
โ โโโ ...
๐ป Environment Setup
# Create and activate conda environment
conda create -n dreamrenderer python=3.10 -y
conda activate dreamrenderer
# Install dependencies
pip install -r requirements.txt
pip install -e .
# Install segment-anything-2
cd segment-anything-2
pip install -e . --no-deps
cd ..
๐งฉ Region/Instance Controllable Rendering
You can quickly use DreamRenderer for precise rendering with the following commands:
python scripts/inference_demo0.py --use_sam_enhance
python scripts/inference_demo1.py --use_sam_enhance
python scripts/inference_demo2.py --num_hard_control_steps=15
๐ Support for ControlNet (rough implementation version)
In the original paper, we used FLUX-depth and FLUX-canny for image-conditioned generation. Now, we also provide a script that supports image-conditioned generation via ControlNet:
python scripts/inferenceCN_demo0.py --res=768
๐ Support for SD3 (rough implementation version)
To further demonstrate the generalizability of our method, we integrated DreamRenderer with another DiT-based architecture, SD3. We use ControlNet to guide generation based on depth:
python scripts/inference_demo5.py --use_sam_enhance
๐ผ๏ธ End-to-End Layout-to-Image Generation
DreamRenderer supports re-rendering outputs from state-of-the-art Layout-to-Image models, enhancing image quality and allowing for fine-grained control over each instance in the layout.
Here's how it works:
- A Layout-to-Image method first generates a coarse image based on the input layout.
- We extract a depth map from this image.
- DreamRenderer then re-renders the scene, guided by the original layout, to produce a higher-quality and more faithful result.
๐ฆ 1. Install Depth Map Extraction (Depth-Anything v2)
We use Depth-Anything v2 for extracting depth maps. To enable this feature, follow these steps:
Step 1: Install the Depth-Anything package
cd Depth-Anything-V2
pip install -e .
cd ..
Step 2: Download Model Weights
Download the Depth-Anything v2 model (depth_anything_v2_vitl.pth) and place it in the pretrained_weights directory:
โโโ pretrained_weights
โ โโโ depth_anything_v2_vitl.pth
โโโ DreamRenderer
โ โโโ ...
โโโ scripts
โ โโโ ...
๐ 2. Run End-to-End Generation
Once everything is set up, you can run the following commands to achieve end-to-end layout-to-image generation.
End-to-end layout-to-image generation with MIGC (download MIGC_SD14.ckpt and put it in pretrained_weights):
python scripts/inference_demo3.py --res=768 --use_sam_enhance --num_hard_control_steps=15
End-to-end layout-to-image generation with InstanceDiffusion (download instancediffusion_sd15.pth and put it in pretrained_weights):
python scripts/inference_demo4.py --use_sam_enhance --num_hard_control_steps=10 --res=768
We will soon integrate with more SOTA layout-to-image methods. Stay tuned!
๐ Comparison with Other Models
๐ Acknowledgements
We would like to thank the developers of FLUX, Segment Anything Model, Depth-Anything, diffusers, CLIP, and other open-source projects that made this work possible. We appreciate their outstanding contributions.
๐ Citation
If you find this repository useful, please cite using the following BibTeX entry:
@misc{zhou2025dreamrenderer,
title={DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models},
author={Dewei Zhou and Mingwei Li and Zongxin Yang and Yi Yang},
year={2025},
eprint={2503.12885},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.12885},
}
๐ฌ Contact
If you have any questions or suggestions, please feel free to contact us ๐!