PSFusion

April 7, 2026 · View on GitHub

This is official Pytorch implementation of "Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity"

✨News:

[2025-3-15] 我们的论文《C2RF: Bridging Multi-modal Image Registration and Fusion via Commonality Mining and Contrastive Learning)》被International Journal of Computer Vision (IJCV) 正式接收！[论文下载] [Code]

[2025-02-11] 我们发布了一个用于红外和可见光视频融合的大规模数据集：M2VD: Multi-modal Multi-scene Video Dataset.

@article{TANG2023PSFusion,
  title={Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity},
  author={Tang, Linfeng and Zhang, Hao and Xu, Han and Ma, Jiayi},
  journal={Information Fusion},
  volume = {99},
  pages = {101870},
  year={2023},
}

Framework

The overall framework of the proposed PSFusion.

To Test

Downloading the pre-trained checkpoint from best_model.pth and putting it in ./results/PSFusion/checkpoints.
Downloading the MSRS dataset from MSRS and putting it in ./datasets.
python test_Fusion.py --dataroot=./datasets --dataset_name=MSRS --resume=./results/PSFusion/checkpoints/best_model.pth

If you need to test other datasets, please put the dataset according to the dataloader and specify --dataroot and --dataset-name

To Train

Before training PSFusion, you need to download the pre-processed MSRS dataset MSRS and putting it in ./datasets.

Then running python train.py --dataroot=./datasets/MSRS --name=PSFusion

Motivation

Comparison of fusion and segmentation results between SeAFusion and our method under harsh conditions.

Comparison of the computational complexity between feature-level fusion and image-level fusion for the semantic segmentation task.

Network Architecture

The architecture of the superficial detail fusion module (SDFM) based on the channel-spatial attention mechanism.

The architecture of the profound semantic fusion module (PSFM) based on the cross-attention mechanism.

Experiments

Qualitative fusion results

Qualitative comparison of PSFusion with 9 state-of-the-art methods on the **MSRS** dataset.

Qualitative comparison of PSFusion with 9 state-of-the-art methods on the **M3FD** dataset.

Quantitative comparisons of the six metrics on 361 image pairs from the MSRS dataset. A point (x, y) on the curve denotes that there are (100*x)% percent of image pairs which have metric values no more than y.

Quantitative comparisons of the six metrics on 300 image pairs from the M3FD dataset.

Segmentation comparison

Segmentation results of various fusion algorithms on the MSRS dataset.

Per-class segmentation results on the MSRS dataset.

Potential of image-level fusion for high-level vision tasks

Segmentation results of feature-level fusion-based multi-modal segmentation algorithms and our image-level fusion-based solution on the MFNet dataset.

Per-class segmentation results of image-level fusion and feature-level fusion on the MFNet dataset.

If this work is helpful to you, please cite it as：

@article{TANG2023PSFusion,
  title={Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity},
  author={Tang, Linfeng and Zhang, Hao and Xu, Han and Ma, Jiayi},
  journal={Information Fusion},
  volume={99},
  pages={101870},
  year={2023},
  publisher={Elsevier}
}

PSFusion

✨News:

Framework

Recommended Environment

To Test

To Train

Motivation

Network Architecture

To Segmentation

BANet

SegFormer

SegNeXt

Experiments

Qualitative fusion results

Quantitative fusion results

Segmentation comparison

Potential of image-level fusion for high-level vision tasks

If this work is helpful to you, please cite it as：