README.md
March 20, 2025 ยท View on GitHub
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Bidirectional pipeline parallelism Chimera is pulished in SC'21, Best Paper Finalist. See the paper and the video talk for more details.
![]()
Data preparation
https://github.com/microsoft/AzureML-BERT/blob/master/docs/dataprep.md
Please store wikipedia.segmented.nltk.txt file under the bert_data/ directory.
Installation
pip install -r requirements.txt
For training, we use apex.optimizers.FusedLAMB of NVIDIA's Apex library. Please follow the instruction for installing apex.
For profiling, we use NVIDIA Nsight Systems. Please make sure you can execute nsys command.
Our scripts are intended to run through the SLURM workload manager on a GPU cluster with 1 GPU per node.
Profiling Chimera with 8 stages for BERT-Large on 8 GPUs
sbatch scripts/prof_steps.sh
sh scripts/plot_cuda_timeline.sh
output: bert_prof/bert-large_chimera_8stages_8gpus_microbs32_acc1.pdf
Publication
To cite our work:
@inproceedings{li143,
author = {Li, Shigang and Hoefler, Torsten},
title = {Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines},
year = {2021},
isbn = {9781450384421},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3458817.3476145},
doi = {10.1145/3458817.3476145},
booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
articleno = {27},
numpages = {14},
location = {St. Louis, Missouri},
series = {SC '21}
}
License
See LICENSE.