DreamFuse (ICCV 2025)
July 25, 2025 ยท View on GitHub
Official implementation of DreamFuse: Adaptive Image Fusion with Diffusion Transformer

๐ TODO
- Release github repo.
- Release inference code.
- Release training code.
- Release model checkpoints.
- Release arXiv paper.
- Release the dataset.
- Release huggingface space demo.
- Release the LDPO code.
๐ Introduction
Image fusion seeks to seamlessly integrate foreground objects with background scenes, producing realistic and harmonious fused images. Unlike existing methods that directly insert objects into the background, adaptive and interactive fusion remains a challenging yet appealing task. To address this, we propose an iterative human-in-the-loop data generation pipeline, which leverages limited initial data with diverse textual prompts to generate fusion datasets across various scenarios and interactions, including placement, holding, wearing, and style transfer. Building on this, we introduce DreamFuse, a novel approach based on the Diffusion Transformer (DiT) model, to generate consistent and harmonious fused images with both foreground and background information. DreamFuse employs a Positional Affine mechanism and uses Localized Direct Preference Optimization guided by human feedback to refine the result. Experimental results show that DreamFuse outperforms SOTA across multiple metrics.
๐ง Dependencies and Installation
git clone https://github.com/LL3RD/DreamFuse-Code.git
cd DreamFuse
conda create -n DreamFuse python=3.10
conda activate DreamFuse
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt
๐ Dataset
We propose an iterative Human-in-the-Loop data generation pipeline and construct a comprehensive fusion dataset containing 80k diverse fusion scenarios. Over half of the dataset features outdoor backgrounds, and approximately 23k images include hand-held scenarios.
Visualization about different fusion scenarios in DreamFuse dataset.
Visualization about different foreground in DreamFuse dataset.
Download the dataset from huggingface:
huggingface-cli download --repo-type dataset --resume-download LL3RD/DreamFuse --local-dir DreamFuse_80K --local-dir-use-symlinks False
Extract the images with:
cat DreamFuse80K.tar.part* > DreamFuse80K.tar
tar -xvf DreamFuse80K.tar
If you want to visualize the data, please refer to the function in data_reader.py to extract data.
๐ Gradio Demo
python inference/dreamfuse_gui.py
โ๏ธ Inference
Run inference on single GPU:
python inference/dreamfuse_inference.py
For multi-GPU support:
python inference/multi_gpu_starter.py
๐ Training
To train DreamFuse from T2I model (flux-dev):
bash dreamfuse_train.sh
Adjust hyperparameters directly in dreamfuse_train.sh and modify the file path in configs/dreamfuse.yaml
๐จ Examples
Please visit our Project Gallery.
๐ Citation
If you find this project useful for your research, please consider citing our paper.
License
DreamFuse follows the FLUX-DEV License standard. See LICENSE for more information