CtrlFuse: Mask-Prompt Guided Controllable Infrared and Visible Image Fusion (Official PyTorch Implementation)

January 30, 2026 ยท View on GitHub

arXiv Framework Stars

This repository contains the official PyTorch implementation of the paper: "CtrlFuse: Mask-Prompt Guided Controllable Infrared and Visible Image Fusion" (Accepted by AAAI 2026)

Authors: Yiming Sun, Yuan Ruan, Qinghua Hu,Pengfei Zhu Affiliation: VisDrone Group

๐Ÿ“ข News

  • [2026-01]: Code and pre-trained models are released!
  • [2025-11-08]: The paper is accepted by AAAI 2026.

๐Ÿ“œ Abstract

Infrared and visible image fusion generates all-weather perception-capable images by combining complementary modalities, enhancing environmental awareness for intelligent unmanned systems. Existing methods either focus on pixel-level fusion while overlooking downstream task adaptability or implicitly learn rigid semantics through cascaded detection/segmentation models, unable to interactively address diverse semantic target perception needs. We propose CtrlFuse, a controllable image fusion framework that enables interactive dynamic fusion guided by mask prompts. The model integrates a multi-modal feature extractor, a reference prompt encoder (RPE), and a prompt-semantic fusion module(PSFM). The RPE dynamically encodes task-specific semantic prompts by fine-tuning pre-trained segmentation models with input mask guidance, while the PSFM explicitly injects these semantics into fusion features. Through synergistic optimization of parallel segmentation and fusion branches, our method achieves mutual enhancement between task performance and fusion quality. Experiments demonstrate state-ofthe-art results in both fusion controllability and segmentation accuracy, with the adapted task branch even outperforming the original segmentation model.

Network Architecture Figure 1: The overall architecture of our proposed CtrlFuse.

๐Ÿ”จ Requirements

The code has been tested with Python 3.8 and PyTorch 2.0.0 .

Checkpoints can be downloaded with the links below:

[Baidu Yun]

Additionally, you can download the ViT-H SAM model from the official Segment-anything website:

[Segment-anything]

# 1. Create a conda environment
conda create -n ctrlfuse python=3.8
conda activate ctrlfuse

# 2. Install dependencies
pip install -r requirements.txt

# 3.  Segment-Anything-Model setting
cd ./segment-anything
pip install -v -e .
cd ..

๐Ÿ“‚ Data Preparation

Please organize your dataset as follows. Note: Ensure that the Visible and Infrared images are strictly aligned (registered) and have the same filenames.

Project_Root/
โ”œโ”€โ”€ dataset/
โ”‚   โ”œโ”€โ”€ train/
โ”‚   โ”‚   โ”œโ”€โ”€ vi/             # Visible images (RGB)
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 1.jpg
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”‚   โ””โ”€โ”€ ir/           # Infrared images (Grayscale)
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 1.jpg
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”‚   โ””โ”€โ”€ mask/             # mask (Grayscale)
โ”‚   โ”‚       โ”œโ”€โ”€ 1.jpg
โ”‚   โ”‚       โ””โ”€โ”€ ...
โ”‚   โ””โ”€โ”€ test/
โ”‚   โ”‚   โ”œโ”€โ”€ vi/             # Visible images (RGB)
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 1.jpg
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”‚   โ””โ”€โ”€ ir/           # Infrared images (Grayscale)
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 1.jpg
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”‚   โ””โ”€โ”€ mask/             # mask (Grayscale)
โ”‚   โ”‚       โ”œโ”€โ”€ 1.jpg
โ”‚   โ”‚       โ””โ”€โ”€ ...

๐Ÿš€ Usage

๐Ÿ“Š Results

FMB DatasetMSEPSNRQabfNabfSSIMSCD
LDFusion0.06160.710.510.1120.5141.549
SwinFuse0.04262.3340.5770.0290.9051.900
NestFuse0.04661.960.4830.0420.7871.594
CDDFuse0.04862.6960.6740.0261.0021.626
DIDFuse0.04761.5650.5280.0420.7651.824
SeAFusion0.04762.5390.6540.0290.9641.62
PSFusion0.05161.5170.6270.0560.8361.875
SDCFusion0.04862.4560.6930.0310.9061.657
CtrlFuse(Ours)0.04363.2920.7190.0240.9251.522
Drone Vehicle DatasetMSEPSNRQabfNabfSSIMSCD
LDFusion0.07659.5730.3760.0540.5681.38
SwinFuse0.08459.1650.2020.0690.5581.295
NestFuse0.07159.7860.3070.0520.4861.413
CDDFuse0.06560.1990.4690.0210.8451.359
DIDFuse0.06759.9880.2650.0620.4661.459
SeAFusion0.09458.6490.4920.0440.8791.472
PSFusion0.06760.0650.4540.0950.7171.534
SDCFusion0.07859.4430.5340.0350.8531.316
CtrlFuse(Ours)0.06360.3170.4960.0350.7791.552
MSRS DatasetMSEPSNRQabfNabfSSIMSCD
LDFusion0.05661.050.4380.1160.5411.515
SwinFuse0.03863.690.1780.0260.3431.033
NestFuse0.03364.1280.2420.0250.2171.138
CDDFuse0.03864.3090.6890.0231.0011.623
DIDFuse0.03563.940.2040.0250.2231.121
SeAFusion0.03664.4910.6750.0210.9821.707
PSFusion0.03764.0010.6760.0420.9171.812
SDCFusion0.03964.0030.7120.0230.9571.739
CtrlFuse(Ours)0.03564.750.6850.0180.9691.726

๐Ÿค Citation

๐Ÿ“ง Contact

If you have any other questions about the code, please email ruanyuan@seu.edu.cn.