README.md
March 2, 2026 · View on GitHub
Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion
*Equal Contribution †Corresponding Author
✨ News:
-
[2026-02-21] Our paper VideoFusion: A Spatio-Temporal Collaborative Network for Multi-modal Video Fusion has been officially accepted by The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)! [Paper] [Code]
-
[2025-09-18] Our paper ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts has been officially accepted by Advances in Neural Information Processing Systems (NeurIPS 2025)! [Paper] [Code]
-
[2025-09-10] Our paper Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion has been officially accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI)! [Paper] [Code]
-
[2025-03-15] Our paper C2RF: Bridging Multi-modal Image Registration and Fusion via Commonality Mining and Contrastive Learning has been officially accepted by the International Journal of Computer Vision (IJCV)! [Paper] [Code]
-
[2025-02-11] We released a large-scale dataset for infrared and visible video fusion: M3SVD: Multi-Modal Multi-Scene Video Dataset.
🔎 Method Overview
Various Scheme Comparison

Framework

Vanilla masking scheme vs. our dual masking scheme.

⚙️ Installation
# git clone this repository
git clone https://github.com/Linfeng-Tang/Mask-DiFuser.git
cd Mask-DiFuser
# create an environment with python >= 3.8
conda create -n mask-difuser python=3.8
conda activate mask-difuser
pip install -r requirements.txt
🚀 Inference
Step 1: Download the pretrained model Mask-DiFuser from Baidu Drive or Google Drive, and put the weight into checkpoint/.
Step 2: Running inference command
python test.py --pretrained_path ./checkpoint/model.pt --task_type VIF --dirA ./dataset/MSRS/ir --dirB ./dataset/MSRS/vi --output_path ./Fusion/MSRS --gpu_ids 0
🔥 Train
Step1: Pretrained models and training data
Please download DIV2K dataset from the official DIV2K Website, structured as follows:
/dataset/DIV2K/
├── train/
│ ├── 0001.png
│ ├── 0002.png
│ └── ...
├── val/
│ ├── 0001.png
│ ├── 0002.png
│ └── ...
Step2: Run code
export OMP_NUM_THREADS=1
torchrun --nproc-per-node=4 train.py --dataset_path ./dataset/DIV2K --output_path ./result --gpu_ids 0,1,2,3
📷 Results
Visual comparison of infrared-visible image fusion results for night scenes on the MSRS dataset

Visual comparison of infrared-visible image fusion results on the RoadScene dataset

Visual comparison of multi-exposure image fusion results on the SICE dataset

Visual comparison of multi-exposure image fusion results on the MEFB dataset

Visual comparison of medical image fusion results on the Harvard dataset

Visual comparison of near-infrared and visible image fusion results on the Nirscene dataset

Visual comparison of multi-polarization fusion results on the Polarization dataset

Visual comparison of multi-focus image fusion results on the Lytro dataset

🕵️♂️ Detection

🎥 Segment

🎓 Citations
If our work is useful for your research, please consider citing and give us a star ⭐:
@article{Tang2026Mask-DiFuser,
author={Tang, Linfeng and Li, Chunyu and Ma, Jiayi},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion},
year={2026},
volume={48},
number={1},
pages={591--608},
}
🤝 Contact
Please feel free to contact: linfeng0419@gmail.com, licy0089@gmail.com.
We are very pleased to communicate with you and will maintain this repository during our free time.
❤️ Acknowledgments
Some codes are brought from CLEDiffusion, Stable-Diffusion. Thanks for their excellent works.