README.md

May 6, 2026 · View on GitHub

CMVF: Cross-modal Unregistered Video Fusion via Spatio-Temporal Consistency

Cross-modal Unregistered Video Fusion via Spatio-Temporal Consistency

Jianfeng Ding, Hao Zhang, Zhongyuan Wang, Jinsheng Xiao, Xin Tian, Zhen Han, Jiayi Ma

CMVF is a cross-modal unregistered video fusion method built around spatio-temporal consistency. It is designed for infrared and visible video inputs that may be spatially misaligned, and it performs coarse registration, temporally consistent fusion, and fine registration with cross-modal consistency.

📰 News

🚀 2026-05-06: We released the VidLLVIP dataset.
🎉 2026-02-05: Our multimodal video fusion paper CMVF was accepted by Information Fusion. The code is available in the CMVF GitHub repository.

Motivation

Motivation of CMVF

_{Figure 1. (a) Cross-modal data acquisition in practical applications. (b) Image fusion primarily utilizes static information from multiple sources, lacking integration of spatio-temporal elements from the raw data. (c) Video fusion effectively combines spatial, temporal, and cross-modal information in an end-to-end manner to produce high-quality, stable, and aligned videos.}

Overview

Overview of CMVF

_{Figure 2. The framework of CMVF comprises three main steps: coarse registration, temporal consistency fusion, and fine registration with cross-modal consistency.}

Preparation

Clone the repository:

git clone https://github.com/jianfeng0369/CMVF.git
cd CMVF

Create a conda environment and install the required dependencies:

conda create -n cmvf python=3.10
conda activate cmvf
pip install -r requirements.txt

Fusion

Place images in data/images and videos in data/videos, and make sure paired infrared and visible inputs share the same file names.
Modify the path_vi, path_ir, and path_op variables in test_*.py to point to your data.
Run image fusion testing:

python test_images.py

Run video fusion testing:

python test_videos.py

Citation

If you find our work useful in your research, please consider citing:

CMVF

@article{cmvf2026ding,
  title   = {CMVF: Cross-modal unregistered video fusion via spatio-temporal consistency},
  journal = {Information Fusion},
  volume  = {132},
  pages   = {104212},
  year    = {2026},
  issn    = {1566-2535},
  author  = {Jianfeng Ding and Hao Zhang and Zhongyuan Wang and Jinsheng Xiao and Xin Tian and Zhen Han and Jiayi Ma}
}

VidLLVIP

If you use the processed VidLLVIP dataset, registration matrices, or preprocessing pipeline, please also cite VidLLVIP:

@dataset{ding2026vidllvip,
  author  = {Ding, Jianfeng},
  title   = {VidLLVIP: A visible-infrared paired video dataset for low-light vision},
  year    = {2026},
  version = {v1.0.0},
  url     = {https://github.com/jianfeng0369/VidLLVIP}
}

License

This project is released under the MIT License.

Contact

If you have any questions, please contact jianfeng0369@gmail.com.