Balancing Training for Multilingual Neural Machine Translation
April 21, 2020 ยท View on GitHub
Implementation of the paper
Balancing Training for Multilingual Neural Machine Translation
Xinyi Wang, Yulia Tsvetkov, Graham Neubig
Data:
The preprocessed and binarized data for fairseq can be downloaded here
To process data from scrach, see the script
util_scripts/prepare_multilingual_data.sh
Training Scripts:
The training scripts for many-to-one translation of the related language group (Related M2O) is under the directory job_scripts/related_ted8_m2o/.
Our methods:
MultiDDS-S:
job_scripts/related_ted8_m2o/multidds_s.sh
MultiDDS:
job_scripts/related_ted8_m2o/multidds.sh
Baselines:
Proportional:
job_scripts/related_ted8_m2o/proportional.sh
Temperature:
job_scripts/related_ted8_m2o/temperature.sh
The scripts for Related O2M is under the directory job_scripts/related_ted8_o2m/
The scripts for Diverse M2O is under the directory job_scripts/diverse_ted8_m2o/
The scripts for Diverse O2M is under the directory job_scripts/diverse_ted8_o2m/
Inference Scripts:
Each of the experiment script directory contains a trans.sh file to translate the test set. To translate the test set for the Related M2O MultiDDS-S
job_scripts/related_ted8_m2o/trans.sh checkpoints/related_ted8_m2o/multidds_s/
To translate other experiment, simply replace the argument with the experiment checkpoint directory.
Citation
Please cite as:
@inproceedings{wang2020multiDDS,
title = {Balancing Training for Multilingual Neural Machine Translation},
author = {Xinyi Wang, Yulia Tsvetkov, Graham Neubig},
booktitle = {ACL},
year = {2020},
}