nllb-api

January 15, 2026 · View on GitHub

uv python

API main.yml cuda.yml clippy.yml client.yml formatter.yml

A fast CPU-based API for Meta's No Language Left Behind distilled 1.3B 8-bit quantised variant, hosted on Hugging Face Spaces. To achieve faster executions, we are using CTranslate2 as our inference engine.

Important


NLLB was trained with input lengths not exceeding 512 tokens. Translating longer sequences might result in quality degradation. Consider splitting your input into smaller chunks if you begin observing artefacts.

Usage

Simply cURL the endpoint like in the following. The source and target languages must be specified using FLORES-200 codes.

List of FLORES-200 Codes
LanguageFLORES-200 Code
Acehnese (Arabic script)ace_Arab
Acehnese (Latin script)ace_Latn
Mesopotamian Arabicacm_Arab
Ta’izzi-Adeni Arabicacq_Arab
Tunisian Arabicaeb_Arab
Afrikaansafr_Latn
South Levantine Arabicajp_Arab
Akanaka_Latn
Amharicamh_Ethi
North Levantine Arabicapc_Arab
Modern Standard Arabicarb_Arab
Modern Standard Arabic (Romanized)arb_Latn
Najdi Arabicars_Arab
Moroccan Arabicary_Arab
Egyptian Arabicarz_Arab
Assameseasm_Beng
Asturianast_Latn
Awadhiawa_Deva
Central Aymaraayr_Latn
South Azerbaijaniazb_Arab
North Azerbaijaniazj_Latn
Bashkirbak_Cyrl
Bambarabam_Latn
Balineseban_Latn
Belarusianbel_Cyrl
Bembabem_Latn
Bengaliben_Beng
Bhojpuribho_Deva
Banjar (Arabic script)bjn_Arab
Banjar (Latin script)bjn_Latn
Standard Tibetanbod_Tibt
Bosnianbos_Latn
Buginesebug_Latn
Bulgarianbul_Cyrl
Catalancat_Latn
Cebuanoceb_Latn
Czechces_Latn
Chokwecjk_Latn
Central Kurdishckb_Arab
Crimean Tatarcrh_Latn
Welshcym_Latn
Danishdan_Latn
Germandeu_Latn
Southwestern Dinkadik_Latn
Dyuladyu_Latn
Dzongkhadzo_Tibt
Greekell_Grek
Englisheng_Latn
Esperantoepo_Latn
Estonianest_Latn
Basqueeus_Latn
Eweewe_Latn
Faroesefao_Latn
Fijianfij_Latn
Finnishfin_Latn
Fonfon_Latn
Frenchfra_Latn
Friulianfur_Latn
Nigerian Fulfuldefuv_Latn
Scottish Gaelicgla_Latn
Irishgle_Latn
Galicianglg_Latn
Guaranigrn_Latn
Gujaratiguj_Gujr
Haitian Creolehat_Latn
Hausahau_Latn
Hebrewheb_Hebr
Hindihin_Deva
Chhattisgarhihne_Deva
Croatianhrv_Latn
Hungarianhun_Latn
Armenianhye_Armn
Igboibo_Latn
Ilocanoilo_Latn
Indonesianind_Latn
Icelandicisl_Latn
Italianita_Latn
Javanesejav_Latn
Japanesejpn_Jpan
Kabylekab_Latn
Jingphokac_Latn
Kambakam_Latn
Kannadakan_Knda
Kashmiri (Arabic script)kas_Arab
Kashmiri (Devanagari script)kas_Deva
Georgiankat_Geor
Central Kanuri (Arabic script)knc_Arab
Central Kanuri (Latin script)knc_Latn
Kazakhkaz_Cyrl
Kabiyèkbp_Latn
Kabuverdianukea_Latn
Khmerkhm_Khmr
Kikuyukik_Latn
Kinyarwandakin_Latn
Kyrgyzkir_Cyrl
Kimbundukmb_Latn
Northern Kurdishkmr_Latn
Kikongokon_Latn
Koreankor_Hang
Laolao_Laoo
Ligurianlij_Latn
Limburgishlim_Latn
Lingalalin_Latn
Lithuanianlit_Latn
Lombardlmo_Latn
Latgalianltg_Latn
Luxembourgishltz_Latn
Luba-Kasailua_Latn
Gandalug_Latn
Luoluo_Latn
Mizolus_Latn
Standard Latvianlvs_Latn
Magahimag_Deva
Maithilimai_Deva
Malayalammal_Mlym
Marathimar_Deva
Minangkabau (Arabic script)min_Arab
Minangkabau (Latin script)min_Latn
Macedonianmkd_Cyrl
Plateau Malagasyplt_Latn
Maltesemlt_Latn
Meitei (Bengali script)mni_Beng
Halh Mongoliankhk_Cyrl
Mossimos_Latn
Maorimri_Latn
Burmesemya_Mymr
Dutchnld_Latn
Norwegian Nynorsknno_Latn
Norwegian Bokmålnob_Latn
Nepalinpi_Deva
Northern Sothonso_Latn
Nuernus_Latn
Nyanjanya_Latn
Occitanoci_Latn
West Central Oromogaz_Latn
Odiaory_Orya
Pangasinanpag_Latn
Eastern Panjabipan_Guru
Papiamentopap_Latn
Western Persianpes_Arab
Polishpol_Latn
Portuguesepor_Latn
Dariprs_Arab
Southern Pashtopbt_Arab
Ayacucho Quechuaquy_Latn
Romanianron_Latn
Rundirun_Latn
Russianrus_Cyrl
Sangosag_Latn
Sanskritsan_Deva
Santalisat_Olck
Sicilianscn_Latn
Shanshn_Mymr
Sinhalasin_Sinh
Slovakslk_Latn
Slovenianslv_Latn
Samoansmo_Latn
Shonasna_Latn
Sindhisnd_Arab
Somalisom_Latn
Southern Sothosot_Latn
Spanishspa_Latn
Tosk Albanianals_Latn
Sardiniansrd_Latn
Serbiansrp_Cyrl
Swatissw_Latn
Sundanesesun_Latn
Swedishswe_Latn
Swahiliswh_Latn
Silesianszl_Latn
Tamiltam_Taml
Tatartat_Cyrl
Telugutel_Telu
Tajiktgk_Cyrl
Tagalogtgl_Latn
Thaitha_Thai
Tigrinyatir_Ethi
Tamasheq (Latin script)taq_Latn
Tamasheq (Tifinagh script)taq_Tfng
Tok Pisintpi_Latn
Tswanatsn_Latn
Tsongatso_Latn
Turkmentuk_Latn
Tumbukatum_Latn
Turkishtur_Latn
Twitwi_Latn
Central Atlas Tamazighttzm_Tfng
Uyghuruig_Arab
Ukrainianukr_Cyrl
Umbunduumb_Latn
Urduurd_Arab
Northern Uzbekuzn_Latn
Venetianvec_Latn
Vietnamesevie_Latn
Waraywar_Latn
Wolofwol_Latn
Xhosaxho_Latn
Eastern Yiddishydd_Hebr
Yorubayor_Latn
Yue Chineseyue_Hant
Chinese (Simplified)zho_Hans
Chinese (Traditional)zho_Hant
Standard Malayzsm_Latn
Zuluzul_Latn

cURL

curl 'https://winstxnhdw-nllb-api.hf.space/api/v4/translator?text=Hello&source=eng_Latn&target=spa_Latn'

To stream translations as Server-Sent Events, you may query the /translator/stream endpoint instead.

curl -N 'https://winstxnhdw-nllb-api.hf.space/api/v4/translator/stream?text=Hello&source=eng_Latn&target=spa_Latn'

You can also determine the source language by querying the following API.

curl 'https://winstxnhdw-nllb-api.hf.space/api/v4/language?text=Hello'

Python

Install the nllb Rust client library.

pip install "nllb @ git+https://git@github.com/winstxnhdw/nllb-api.git#subdirectory=client"

Then, you can use the AsyncTranslatorClient to interact with the API.

from nllb import AsyncTranslatorClient

async def main():
    text = "Hello, world!"
    client = AsyncTranslatorClient("http://localhost:7860")
    language_prediction = await client.detect_language(text)
    response = await client.translate(text, source=language_prediction.language, target="spa_Latn")

asyncio.run(main())

Ideally, you would want to chunk your texts in batches under 512 tokens.

client = AsyncTranslatorClient("http://localhost:7860")
language_prediction = await client.detect_language(' '.join(words[:10]))

while (await client.count_tokens(' '.join(words))) > 512:
    words.pop()

response = await client.translate(' '.join(words), source=language_prediction.language, target="spa_Latn")

Self-Hosting

You can self-host the API and access the Swagger UI at localhost:7860/api/schema/swagger with the following minimal configuration

docker run --init --rm \
  -e SERVER_PORT=7860 \
  -p 7860:7860 \
  ghcr.io/winstxnhdw/nllb-api:main

Cross-Origin Resource Sharing

You can configure CORS by passing the following environment variables.

docker run --init --rm \
  -e SERVER_PORT=7860 \
  -e ACCESS_CONTROL_ALLOW_ORIGIN=localhost,example.com \
  -e ACCESS_CONTROL_ALLOW_CREDENTIALS=true \
  -e ACCESS_CONTROL_ALLOW_HEADERS=X-Custom-Header,Upgrade-Insecure-Requests \
  -e ACCESS_CONTROL_EXPOSE_HEADERS=Content-Encoding,Kuma-Revision \
  -e ACCESS_CONTROL_MAX_AGE=3600 \
  -e ACCESS_CONTROL_ALLOW_METHOD_GET=true \
  -e ACCESS_CONTROL_ALLOW_METHOD_POST=true \
  -e ACCESS_CONTROL_ALLOW_METHOD_OPTIONS=true \
  -e ACCESS_CONTROL_ALLOW_METHOD_PUT=true \
  -e ACCESS_CONTROL_ALLOW_METHOD_DELETE=true \
  -e ACCESS_CONTROL_ALLOW_METHOD_PATCH=true \
  -e ACCESS_CONTROL_ALLOW_METHOD_HEAD=true \
  -e ACCESS_CONTROL_ALLOW_METHOD_TRACE=true \
  -p 7860:7860 \
  ghcr.io/winstxnhdw/nllb-api:main

Optimisation

You can pass the following environment variables to optimise the API for your own uses. The value of OMP_NUM_THREADS increases the number of threads used to translate a given batch of inputs, while TRANSLATOR_THREADS increases the number of threads used to handle translate requests in parallel. It is recommended to not modify WORKER_COUNT as spawning multiple workers can lead to increased memory usage and poorer performance.

Important


OMP_NUM_THREADS ×\times TRANSLATOR_THREADS should not exceed the physical number of cores on your machine.

docker run --init --rm \
  -e SERVER_PORT=7860 \
  -e OMP_NUM_THREADS=6 \
  -e TRANSLATOR_THREADS=2 \
  -e WORKER_COUNT=1 \
  -p 7860:7860 \
  ghcr.io/winstxnhdw/nllb-api:main

CUDA Support

You can accelerate your inference with CUDA by building with the USE_CUDA build argument.

docker build --build-arg USE_CUDA=1 -f Dockerfile.build -t nllb-api .

After building the image, you can run the image with the following.

Note


OMP_NUM_THREADS has no effect when CUDA is enabled.

docker run --init --rm --gpus all \
  -e SERVER_PORT=7860 \
  -e WORKER_COUNT=1 \
  -p 7860:7860 \
  nllb-api

Telemetry

You can enable OpenTelemetry support by passing the OTEL_EXPORTER_OTLP_ENDPOINT environment variable. This enables exporting of traces, metrics and logs to the specified OTLP endpoint.

docker run --init --rm \
  -e SERVER_PORT=7860 \
  -e OTEL_RESOURCE_ATTRIBUTES=service.namespace=huggingface,deployment.environment=production \
  -e OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-ap-southeast-1.grafana.net/otlp \
  -e OTEL_EXPORTER_OTLP_HEADERS="Authorization: Basic $OTEL_AUTH_TOKEN" \
  -e OTEL_METRIC_EXPORT_INTERVAL=10000 \
  -p 7860:7860 \
  ghcr.io/winstxnhdw/nllb-api:main

Development

First, install the required dependencies for your editor with the following.

uv sync

Now, you can access the Swagger UI at localhost:7860/api/schema/swagger after spinning the server up locally with the following.

uv run docker-cpu
uv run docker-cpu