FedRepOpt (ACCV 2024)

September 27, 2024 · View on GitHub

This repository implements the model proposed in the ACCV 2024 paper:

Kin Wai Lau, Yasar Abbas Ur Rehman, Pedro Porto Buarque de Gusmão, Lai-Man Po, Lan Ma, Yuyang Xie, FedRepOpt: Gradient Re-parametrized Optimizers in Federated Learning [arXiv paper]

The implementation code is based on the Re-parameterizing Your Optimizers rather than Architectures, ICLR, 2023. For more information, please refer to the link.

Citing

When using this code, kindly reference:

@misc{lau2024fedrepoptgradientreparametrizedoptimizers,
      title={FedRepOpt: Gradient Re-parametrized Optimizers in Federated Learning}, 
      author={Kin Wai Lau and Yasar Abbas Ur Rehman and Pedro Porto Buarque de Gusmão and Lai-Man Po and Lan Ma and Yuyang Xie},
      year={2024},
      eprint={2409.15898},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2409.15898}, 
}

Pretrained models

You can download our hyperparameter searched models on CIFAR100 as follow:

  • GhostNet-Tr 0.5x (arch: ghost-hs) link
  • RepVGG-B1 (arch:RepOpt-VGG-B1-hs) link

You can download our pretrained models on Tiny ImagenNet as follow: Pretrained on 1 local epoch and 240 rounds with cross-silo NIID setting

  • Fed-RepGhost-Tr 0.5x (arch: ghost-rep) link

  • Fed-RepGhost-Inf 0.5x (arch: ghost-target-norepopt) link

  • Fed-CSLA-Ghost 0.5x (arch: ghost-csla) link

  • FedRepOpt-GhostNet 0.5x (arch: ghost-target) link

  • Fed-RepVGG-B1-Tr (arch: RepVGG-B1-repvgg) link

  • Fed-RepVGG-B1-Inf (arch: RepVGG-B1-target-norepopt) link

  • Fed-CSLA-VGG-B1 (arch: RepVGG-B1-csla) link

  • FedRepOpt-VGG-B1 (arch: RepVGG-B1-target) link

Pretrained on 5 local epoch and 1000 rounds with cross-device NIID setting

  • Fed-RepGhost-Tr 0.5x (arch: ghost-rep) link

  • Fed-RepGhost-Inf 0.5x (arch: ghost-target-norepopt) link

  • Fed-CSLA-Ghost 0.5x (arch: ghost-csla) link

  • FedRepOpt-GhostNet 0.5x (arch: ghost-target) link

  • Fed-RepVGG-B1-Tr (arch: RepVGG-B1-repvgg) link

  • Fed-RepVGG-B1-Inf (arch: RepVGG-B1-target-norepopt) link

  • Fed-CSLA-VGG-B1 (arch: RepVGG-B1-csla) link

  • FedRepOpt-VGG-B1 (arch: RepVGG-B1-target) link

Data Preparation

You can download our NIID Tiny ImageNet annotations files as follow:

  • Cross silo NIID (α=0.1 in Dirichlet distribution, number of client=10) link
  • Cross device NIID (α=0.1 in Dirichlet distribution, number of client=100) link
  • data_splitter/tiny-imagenet_json_splitter_direchlet.py script provides a tool for generating IID and NIID annotations for Tiny-ImageNet.

Preparation

  • Requirements:

    • Python 3.8.0
    • PyTorch 1.7.1
    • Flower 1.3.0
  • Install the required packages:

pip install -r requirements.txt

Hyper-Search on CIFAR100

You can run the following command to conduct a hyper-parameter search on CIFAR100 in a centralized setting.

python -m torch.distributed.launch --nproc_per_node NUM_GPUS --master_port PORT_NUM main_repopt_centralized.py \
--data-path /path/to/cifar100 \
--arch ghost-hs \
--batch-size 128 \
--tag search \
--opts TRAIN.EPOCHS 600 TRAIN.BASE_LR 0.6 TRAIN.WEIGHT_DECAY 1e-5 TRAIN.WARMUP_EPOCHS 10 MODEL.LABEL_SMOOTHING 0.1 DATA.DATASET cf100 TRAIN.CLIP_GRAD 5.0
  • hs.sh provides examples of commands for finding hyperparameters for RepOpt-VGG-B1 and GhostNet.

Train FedRepOpt on Tiny ImageNet

You can run the following command to conduct a federated training on Tiny ImageNet.

python src_fl/main.py \
--data-path /path/to/tiny-imagenet-200 \
--arch ghost-target-tinyImageNet \
--batch-size 32 \
--tag experiment \
--num_clients_per_round 10 \
--pool_size 10 \
--rounds 240 \
--scales-path /path/to/hyper-parameter-search/model \
--opts TRAIN.EPOCHS 1 TRAIN.BASE_LR 0.01 TRAIN.LR_SCHEDULER.NAME step TRAIN.LR_SCHEDULER.DECAY_RATE 0.0 TRAIN.WEIGHT_DECAY 4e-5 TRAIN.WARMUP_EPOCHS 0 MODEL.LABEL_SMOOTHING 0.1 AUG.PRESET raug15 DATA.DATASET tiny_imagenet DATA.IMG_SIZE 64 LOGOUTPUT log TRAIN.OPTIMIZER.MOMENTUM 0.0 \
DATA.ANNOTATIONS_FED annotations_fed_alpha_0.1_clients_10 SEED 0
  • num_clients_per_round represents number of clients participating in the training for each round and pool_size represents number of dataset partitions (= number of total clients). If num_clients_per_round is set to 10 and pool_size is 10, all the clients participate in the training.
  • round represents the total number of FL rounds and TRAIN.EPOCHS represents the total number of training epochs for each clients.
  • train_repopt_fl.sh provides training command examples for all the models.
  • The evaluation results will be stored in output/arch/server/log_rank0.txt.