No Language Left Behind Multi Domain
July 6, 2022 ยท View on GitHub
NLLB Multi Domain is a set of professionally-translated sentences in News, Unscripted informal speech, and Health domains. It is designed to enable assessment of out-of-domain performance and to study domain adaptation for machine translation. Each domain has approximately 3000 sentences.
Download
NLLB-Multi Domain can be downloaded using the following links:
which you can download with the following commands:
wget --trust-server-names https://tinyurl.com/NLLBMDchat
wget --trust-server-names https://tinyurl.com/NLLBMDnews
wget --trust-server-names https://tinyurl.com/NLLBMDhealth
Languages in NLLB Multi Domain
| Language | FLORES-200 code |
|---|---|
| Central Aymara | ayr_Latn |
| Bhojpuri | bho_Deva |
| Dyula | dyu_Latn |
| Friulian | fur_Latn |
| Russian | rus_Cyrl |
| Wolof | wol_Latn |