RLFW-NAT
June 28, 2021 · View on GitHub
Code for "Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation" (ACL 2021).
We propose LFR: raw pretraining → bidirectional KD (→KD + ←KD) → forward KD finetune, so NAT sees more low-frequency word links without changing the model.
Paper | Pretrained: En-De · Ro-En
Setup
git clone https://github.com/alphadl/RLFW-NAT.git
cd RLFW-NAT
pip install -e fairseq_mask/ -e fairseq_lev/
Data
Put under data/ende_data/: train_raw.{en,de}, train_kd.{en,de} (forward KD), train_bt.{en,de} (backward KD), valid.{en,de}, test.{en,de} (BPE). Then:
SRC=en TGT=de DATA_DIR=./data/ende_data bash preprocess.sh
Creates databin/{raw_PT,forward_KD,reversed_KD,bidirectional_KD}.
Training (LFR)
Mask-Predict:
SRC=en TGT=de DATA_DIR=./data/ende_data SAVE_DIR=./checkpoint/ende/mask_lfr bash train_mask.sh
Levenshtein:
SRC=en TGT=de DATA_DIR=./data/ende_data SAVE_DIR=./checkpoint/ende/lev_lfr bash train_lev.sh
Best checkpoint: SAVE_DIR/fwd/checkpoint_best.pt.
Eval
DATA=./data/ende_data/databin/forward_KD CHECKPOINT=./checkpoint/ende/mask_lfr/fwd SUBSET=test bash eval_mask.sh
REF=./data/ende_data/test.de bash test_mask.sh
Citation
@inproceedings{ding2021rejuvenating,
title = {Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation},
author = {Ding, Liang and Wang, Longyue and Liu, Xuebo and Wong, Derek F. and Tao, Dacheng and Tu, Zhaopeng},
booktitle = {ACL},
year = {2021}
}
License
CC-BY-NC 4.0.