RLFW-NAT

June 28, 2021 · View on GitHub

Code for "Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation" (ACL 2021).

We propose LFR: raw pretraining → bidirectional KD (→KD + ←KD) → forward KD finetune, so NAT sees more low-frequency word links without changing the model.

Paper | Pretrained: En-De · Ro-En

Setup

git clone https://github.com/alphadl/RLFW-NAT.git
cd RLFW-NAT
pip install -e fairseq_mask/ -e fairseq_lev/

Data

Put under data/ende_data/: train_raw.{en,de}, train_kd.{en,de} (forward KD), train_bt.{en,de} (backward KD), valid.{en,de}, test.{en,de} (BPE). Then:

SRC=en TGT=de DATA_DIR=./data/ende_data bash preprocess.sh

Creates databin/{raw_PT,forward_KD,reversed_KD,bidirectional_KD}.

Training (LFR)

Mask-Predict:

SRC=en TGT=de DATA_DIR=./data/ende_data SAVE_DIR=./checkpoint/ende/mask_lfr bash train_mask.sh

Levenshtein:

SRC=en TGT=de DATA_DIR=./data/ende_data SAVE_DIR=./checkpoint/ende/lev_lfr bash train_lev.sh

Best checkpoint: SAVE_DIR/fwd/checkpoint_best.pt.

Eval

DATA=./data/ende_data/databin/forward_KD CHECKPOINT=./checkpoint/ende/mask_lfr/fwd SUBSET=test bash eval_mask.sh
REF=./data/ende_data/test.de bash test_mask.sh

Citation

@inproceedings{ding2021rejuvenating,
  title = {Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation},
  author = {Ding, Liang and Wang, Longyue and Liu, Xuebo and Wong, Derek F. and Tao, Dacheng and Tu, Zhaopeng},
  booktitle = {ACL},
  year = {2021}
}

License

CC-BY-NC 4.0.