README.md

March 17, 2026 ยท View on GitHub

Inference-Time Dynamic Modality Selection for Incomplete Multimodal Classification (ICLR 2026)

Siyi Du, Xinzhe Luo, Declan P. O'Regan, and Chen Qin

GitHub stars

DyMo

(a-b) Evidence of the discarding-imputation dilemma: (a-1) vs. (a-2) recovery-free methods (e.g., ModDrop) learn less discriminative features because they ignore highly task-relevant missing modalities {M,T}; (b) recovery-based methods (e.g., MoPoE) generate unreliable reconstructions, e.g., low-fidelity (orange) or misaligned (yellow). (c) Our DyMo, which addresses the dilemma by dynamically fusing task-relevant recovered modalities, improving accuracy by 1.61% on PolyMNIST, 1.68% on MST, and 3.88% on CelebA (Tab 1).

This repository provides the official PyTorch implementation of Inference-Time Dynamic Modality Selection for Incomplete Multimodal Classification.

In addition to our DyMo, we include implementations of multiple baseline and comparison models, such as M3Care, MUSE, MTL, MAP, PDF, QMF, and DynMM. Please refer to the paper for detailed descriptions of these models.

Contact: s.du23@imperial.ac.uk (Siyi Du)

If you find this repository helpful, please consider giving it a :star:.

Updates

[2026-02-08] The arXiv paper and the code are released.

[2026-02-11] Upload model weights to Hugging Face.

Our Multimodal Learning Research Line

This repository is part of our research line on multimodal learning.

  • TIP (ECCV2024): An image-tabular pre-training framework for intra-modality missingness (siyi-wind/TIP)

  • STiL (CVPR 2025): A semi-supervised image-tabular framework for modality heterogeneity and limited labeled data (siyi-wind/STiL)

  • DyMo (ICLR 2026, this work): An inference-time dynamic modality selection framework for missing modality (siyi-wind/DyMo)

Contents

1 Requirements

This codebase is implemented with Python 3.9.15, PyTorch 1.11.0, CUDA 11.3.1, and cuDNN 8.

cd DyMo/
conda env create --file environment.yaml
conda activate dymo
pip install numpy==1.23.5

2 Preparation

2.1 Data Downloading

We conduct experiments on five multimodal datasets with diverse modality combinations:

DatasetClassification Task#ModalityModality Type#Train#Val#Test#Class
PolyMNISTDigit5RGB image60,0003,0007,00010
MSTDigit3RGB image, Text1,121,36060,000140,00010
CelebAFace attribute2RGB image, Text162,77019,96219,8672
DVMCar model2RGB image, Table70,56517,64288,207283
UKBBCoronary artery disease (CAD)2MR image, Table3,4826,5103,6172
UKBBMyocardial infarction2MR image, Table1,5526,5103,6172
  • The preparation of PolyMNIST, MNIST-SVHN-TEXT(MST), and CelebA follows https://github.com/thomassutter/MoPoE.

  • Download the DVM dataset from here.

  • Apply for UKBB access here (Note that UKBB is semi-public and requires approval and access fees).

2.2 Data Pre-processing

For UKBB and DVM, we conduct the same preprocessing pipelines as in siyi-wind/TIP.

2.3 Modality Reconstructor Preparation

We evaluate DyMo with five modality recovery methods: MoPoE, MMVAE++, CMVAE, TIP, and Iterative Multivariate Imputer (IMI).

DatasetModality Recovery ModelDownload Weights
PolyMNISTMoPoE, MMVAE++, CMVAEFolder
MSTMoPoEFolder
CelebAMoPoEFolder
DVMTIP, IMITIP
UKBBTIP, IMITIP

TIP weights are from our ECCV 2024 paper siyiwind/TIP, while other models are trained by ourselves. For IMI, we directly provide the imputed DVM tables in datasets/IMI_imputed_data/DVM. Due to UKBB data policy, imputed UKBB tables are not released, but can be generated using datasets/tabular_imputation_UKBB.ipynb when you have the access to the UKBB dataset.

3 Training & Testing

We record the hyper-parameters used for each experiment under configs/ using the Hydra format, so it is very easy to reproduce models on different datasets. Below we provide some examples.

3.1 DyMo

Step 1: Train DynamicTransformer

DynamicTransformer is the backbone of DyMo (Fig. 2 in the paper). Below is an example to train DynamicTransformer on CelebA.

cd DyMo/job_scripts
env CUDA_VISIBLE_DEVICES=0 bash CelebA_DynamicTransformer.sh

Trained model weights are provided in 4 Checkpoints.

Step 2: Generate Gaussian Parameters for The ICS score

After completing the training of DynamicTransformer, we need to generate and store class-wise means and variances for calculating the ICS score (Eq. 8 in the paper). Use gaussian_parameter_generation/gaussian_COS.ipynb and gaussian_parameter_generation/gaussian_EU.ipynb to generate gaussian parameters for euclidean distance and cosine distance, separately.

Precomputed parameters are also available in 4 Checkpoints.

Step 3: Run DyMo at the Test Dataset

DyMo is an inference-time method and is computationally efficient:

cd DyMo/job_scripts
env CUDA_VISIBLE_DEVICES=0 bash PolyMNIST_DyMo.sh

Other Models

To train or evaluate other baseline models, modify the model name in the shell scripts under job_scripts/. Available model configurations are listed in configs/model/.

4 Checkpoints

DatasetDownload Model Checkpoints and Gaussian Parameters
PolyMNISTDownload
MSTDownload
CelebADownload
DVMDownload
CADDownload
InfarctionDownload

5 Licence & Citation

This repository is licensed under the Apache License, Version 2.

If you find this work useful, please cite:

@inproceedings{du2026dymo,
  title={Inference-Time Dynamic Modality Selection for Incomplete Multimodal Classification},
  author={Du, Siyi and Luo, Xinzhe and O'Regan, Declan P. and Qin, Chen},
  booktitle={International Conference on Learning Representations (ICLR) 2026},
  year={2026}}

6 Acknowledgements

We would like to thank the following repositories for their valuable contributions: