README.md
March 2, 2026 · View on GitHub
ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts [NeurIPS 2025]
*Equal Contribution †Corresponding Author
✨ News:
-
[2026-02-21] Our paper VideoFusion: A Spatio-Temporal Collaborative Network for Multi-modal Video Fusion has been officially accepted by The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)! [Paper] [Code]
-
[2025-09-18] Our paper ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts has been officially accepted by Advances in Neural Information Processing Systems (NeurIPS 2025)! [Paper] [Code]
-
[2025-09-10] Our paper Mask-DiFuser: A Masked Diffusion Model for Unified Unsupervised Image Fusion has been officially accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI)! [Paper] [Code]
-
[2025-03-15] Our paper C2RF: Bridging Multi-modal Image Registration and Fusion via Commonality Mining and Contrastive Learning has been officially accepted by the International Journal of Computer Vision (IJCV)! [Paper] [Code]
-
[2025-02-11] We released a large-scale dataset for infrared and visible video fusion: M3SVD: Multi-Modal Multi-Scene Video Dataset.
🔎 Method Overview
Motivation

Framework

Frequency Domain Comparison

🔧 Environment Setup
-
Clone this repository:
git clone https://github.com/Linfeng-Tang/ControlFusion.git cd ControlFusion -
Create a Conda environment (recommended):
conda create -n controlfusion python=3.8 -y conda activate controlfusion -
Install dependency packages:
pip install -r requirements.txt
📂 Dataset Construction
please refer to genDateset
📂 Dataset Download
📥 Pre-trained Weights
Download the pretrained model Mask-DiFuser from Baidu Drive, and put the weight into pretrained_weights/.
🧪 Inference
You can use the test.py script we provide to fuse pairs of images. Please make sure you have downloaded the pre-trained weights.
You can modify ControlFusion.py to select text/auto control by:
text_features = self.get_text_feature(text.expand(b, -1)).to(inp_img_A.dtype)
text_features = imgfeature
🚂 Train
You can use the train.py script we provide to train. Make sure you have organized your train dataset correctly.
📷 Results
Visualization of fusion results in different degraded scenarios

Generalization results in the real world

🕵️♂️ Detection

🎓 Citations
If our work is useful for your research, please consider citing and give us a star ⭐:
@inproceedings{Tang2025ControlFusion,
author={Linfeng Tang, Yeda Wang, Zhanchuan Cai, Junjun Jiang, and Jiayi Ma},
title={ControlFusion: A Controllable Image Fusion Network with Language-Vision Degradation Prompts},
booktitle={Advances in Neural Information Processing Systems},
year={2025},
}
🤝 Contact
Please feel free to contact: linfeng0419@gmail.com, wangyeda@whu.edu.cn.
We are very pleased to communicate with you and will maintain this repository during our free time.