No Language Left Behind Multi Domain

July 6, 2022 ยท View on GitHub

NLLB Multi Domain is a set of professionally-translated sentences in News, Unscripted informal speech, and Health domains. It is designed to enable assessment of out-of-domain performance and to study domain adaptation for machine translation. Each domain has approximately 3000 sentences.


Download

NLLB-Multi Domain can be downloaded using the following links:

which you can download with the following commands:

wget --trust-server-names https://tinyurl.com/NLLBMDchat
wget --trust-server-names https://tinyurl.com/NLLBMDnews
wget --trust-server-names https://tinyurl.com/NLLBMDhealth

Languages in NLLB Multi Domain

LanguageFLORES-200 code
Central Aymaraayr_Latn
Bhojpuribho_Deva
Dyuladyu_Latn
Friulianfur_Latn
Russianrus_Cyrl
Wolofwol_Latn