README.md

October 31, 2022 · View on GitHub

Implementation for paper:

Multi-granularity Optimization for Non-autoregressive Translation (To appear in EMNLP 2022)

Paper Link

Setup

pip install -e . 
pip install tensorflow tensorboard sacremoses nltk Ninja omegaconf
pip install 'fuzzywuzzy[speedup]'
pip install hydra-core==1.0.6
pip install sacrebleu==1.5.1
pip install git+https://github.com/dugu9sword/lunanlp.git

Experimental Details

Hyperparameters

Pretrain stage

EN<->ROEN<->DE
--validate-interval-updates300300
number of tokens per batch32K128K
--dropout0.30.1
--max-update300k300k

MgMO stage

EN<->ROEN<->DE
--validate-interval-updates300300
number of tokens per batch2561024
--dropout0.10.1
--lr-schedulerfixedfixed
--lr2e-62e-6

Arguments

ArgumentDescription
--n-sampleNumber of samples in the search space
--reward-alphaCoefficient for balancing the sentence probability
--max-length-biasMax deviation of the predicted length to the golden length during training
--null-inputSet for N&P mode (default)
--rm-scaleThe gamma for controling the granularity size
--len-lossSet to enable length loss during training

Training

We provide a script (run.sh) for replicating the results on WMT'16 EN->RO task. For other directions, you need to adjust the data path and corresponding hyper-paramters where necessary.

Evaluation

We select the best checkpoint for evaluation based on the validation BLEU scores. We set the length beam as 5 for inference. See `run.sh' for details.

Main Files

The implementation is based on Fairseq. We mainly add the following files.

fairseq
├── criterions
│   └── multi_granularity_optimizer.py  # mutli-granularity loss
└── models
    └── nat
        └── cmlm_transformer.py         # implementation for sampling and granularity generation