[ECCV2024] Towards Multi-modal Transformers in Federated Learning

February 2, 2025 · View on GitHub

Official repository for Towards Multi-modal Transformers in Federated Learning (ECCV2024). Code will be released soon.

Citation

@inproceedings{sun2024towards,
  title={Towards Multi-modal Transformers in Federated Learning},
  author={Sun, Guangyu and Mendieta, Matias and Dutta, Aritra and Li, Xin and Chen, Chen},
  booktitle={European Conference on Computer Vision},
  pages={229--246},
  year={2024},
  organization={Springer}
}

@article{sun2024towards,
  title={Towards Multi-modal Transformers in Federated Learning},
  author={Sun, Guangyu and Mendieta, Matias and Dutta, Aritra and Li, Xin and Chen, Chen},
  journal={arXiv preprint arXiv:2404.12467},
  year={2024}
}

Get Started

Environment

Python version: 3.8.0

pip install -r requirements.txt

Prepare Data

Option 1: Directly download the entire data folder from google drive

Option 2:

Download Flickr30k dataset and put all images into data/flickr30k/flickr30k_images.

Download MS-COCO 2014 and put all images and annotations into data/coco/all_images and data/coco/annotations

Wandb for Logging

Set up wandb.init() with your own project name and entity.

Experiments

Please use scripts under scirpts to run experiments with the methods and settings to reproduce the results in our paper.

Model Explaination

We unify the img and text encoders into one model ModalityAgnosticTransformer for easier aggregation:

shared_param: Shared parameters between same modality in different type of client (i.e., img encoder in img client and img encoder in img-txt client)

share_scope: Shared scope during aggregation dataset: share parameters only to encoders with the same dataset modality: share parameters only to encoders with the same modality all: share parameters among all encoders

colearn_param: Shared parameters between img and txt encoders

Method Configurations

To correctly configurate each method, please follow this table:

Name	shared_param	share_scope	algorithm	Others
FedAVG	none	dataset	fedavg
FedIoT	blocks	modality_exact	fediot
FedProx	none	dataset	fedprox
CreamFL	none	dataset	creamfl
FedCola (ours)	attn	modality	fedavg	--aux --aux_trained

Acknowledgement

This codebase is based on Federated Learning in PyTorch. We extend it to our multi-modal federated learning setting.

For local complementary training, we adapted code from here to add aux weights from the other modality.