DreamFuse (ICCV 2025)

July 25, 2025 · View on GitHub

Official implementation of DreamFuse: Adaptive Image Fusion with Diffusion Transformer

Image fusion seeks to seamlessly integrate foreground objects with background scenes, producing realistic and harmonious fused images. Unlike existing methods that directly insert objects into the background, adaptive and interactive fusion remains a challenging yet appealing task. To address this, we propose an iterative human-in-the-loop data generation pipeline, which leverages limited initial data with diverse textual prompts to generate fusion datasets across various scenarios and interactions, including placement, holding, wearing, and style transfer. Building on this, we introduce DreamFuse, a novel approach based on the Diffusion Transformer (DiT) model, to generate consistent and harmonious fused images with both foreground and background information. DreamFuse employs a Positional Affine mechanism and uses Localized Direct Preference Optimization guided by human feedback to refine the result. Experimental results show that DreamFuse outperforms SOTA across multiple metrics.

🔧 Dependencies and Installation

git clone https://github.com/LL3RD/DreamFuse-Code.git
cd DreamFuse
conda create -n DreamFuse python=3.10
conda activate DreamFuse
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

📖 Dataset

We propose an iterative Human-in-the-Loop data generation pipeline and construct a comprehensive fusion dataset containing 80k diverse fusion scenarios. Over half of the dataset features outdoor backgrounds, and approximately 23k images include hand-held scenarios.

assert

Visualization about different fusion scenarios in DreamFuse dataset.

assert

Visualization about different foreground in DreamFuse dataset.

Download the dataset from huggingface:

huggingface-cli download --repo-type dataset --resume-download LL3RD/DreamFuse --local-dir DreamFuse_80K --local-dir-use-symlinks False

Extract the images with:

cat DreamFuse80K.tar.part* > DreamFuse80K.tar
tar -xvf DreamFuse80K.tar

If you want to visualize the data, please refer to the function in data_reader.py to extract data.

🌟 Gradio Demo

python inference/dreamfuse_gui.py

✍️ Inference

Run inference on single GPU:

python inference/dreamfuse_inference.py

For multi-GPU support:

python inference/multi_gpu_starter.py

🚀 Training

To train DreamFuse from T2I model (flux-dev):

bash dreamfuse_train.sh

Adjust hyperparameters directly in dreamfuse_train.sh and modify the file path in configs/dreamfuse.yaml

DreamFuse (ICCV 2025)

🚀 TODO

📖 Introduction

🔧 Dependencies and Installation

📖 Dataset

🌟 Gradio Demo

✍️ Inference

🚀 Training

🎨 Examples

📄 Citation

License