README.md
May 6, 2026 · View on GitHub
CMVF: Cross-modal Unregistered Video Fusion via Spatio-Temporal Consistency
Cross-modal Unregistered Video Fusion via Spatio-Temporal Consistency
Jianfeng Ding, Hao Zhang, Zhongyuan Wang, Jinsheng Xiao, Xin Tian, Zhen Han, Jiayi Ma
English | 简体中文
CMVF is a cross-modal unregistered video fusion method built around spatio-temporal consistency. It is designed for infrared and visible video inputs that may be spatially misaligned, and it performs coarse registration, temporally consistent fusion, and fine registration with cross-modal consistency.
📰 News
- 🚀 2026-05-06: We released the VidLLVIP dataset.
- 🎉 2026-02-05: Our multimodal video fusion paper CMVF was accepted by Information Fusion. The code is available in the CMVF GitHub repository.
Motivation
Figure 1. (a) Cross-modal data acquisition in practical applications. (b) Image fusion primarily utilizes static information from multiple sources, lacking integration of spatio-temporal elements from the raw data. (c) Video fusion effectively combines spatial, temporal, and cross-modal information in an end-to-end manner to produce high-quality, stable, and aligned videos.
Overview
Figure 2. The framework of CMVF comprises three main steps: coarse registration, temporal consistency fusion, and fine registration with cross-modal consistency.
Preparation
- Clone the repository:
git clone https://github.com/jianfeng0369/CMVF.git
cd CMVF
- Create a conda environment and install the required dependencies:
conda create -n cmvf python=3.10
conda activate cmvf
pip install -r requirements.txt
Fusion
-
Place images in
data/imagesand videos indata/videos, and make sure paired infrared and visible inputs share the same file names. -
Modify the
path_vi,path_ir, andpath_opvariables intest_*.pyto point to your data. -
Run image fusion testing:
python test_images.py
- Run video fusion testing:
python test_videos.py
Citation
If you find our work useful in your research, please consider citing:
CMVF
@article{cmvf2026ding,
title = {CMVF: Cross-modal unregistered video fusion via spatio-temporal consistency},
journal = {Information Fusion},
volume = {132},
pages = {104212},
year = {2026},
issn = {1566-2535},
author = {Jianfeng Ding and Hao Zhang and Zhongyuan Wang and Jinsheng Xiao and Xin Tian and Zhen Han and Jiayi Ma}
}
VidLLVIP
If you use the processed VidLLVIP dataset, registration matrices, or preprocessing pipeline, please also cite VidLLVIP:
@dataset{ding2026vidllvip,
author = {Ding, Jianfeng},
title = {VidLLVIP: A visible-infrared paired video dataset for low-light vision},
year = {2026},
version = {v1.0.0},
url = {https://github.com/jianfeng0369/VidLLVIP}
}
License
This project is released under the MIT License.
Contact
If you have any questions, please contact jianfeng0369@gmail.com.