Language Support

March 31, 2021 · View on GitHub

Supported languages

Native Support means that the tokenizer and stemmer are included in javascript in NLP.js. BERT Support means that the tokenizer and stemmer are supported through a BERT API made in python. You can see how to create this API here: https://github.com/axa-group/nlp.js/tree/master/examples/80-bert-server

Microsoft Builtins mean that the Builtin Entity extraction is supported directly in javascript, while the ones supported by Duckling requires the deployment of a Duckling instance.

Languages not included in this list can be still supported, but without stemming, only tokenizing. That means less precision, but most of the time this can be good enough. As an example you can use it for fantasy languages (during unit testing you'll find tests in klingon from Star Trek).

LocaleLanguageNative SupportBERT SupportMicrosoft BuiltinsDuckling BuiltinsSentiment
afAfrikaansXX
sqAlbanianX
arArabicXXXX
anAragoneseX
hyArmenianXXX
astAsturianX
azAzerbaijaniX
baBashkirX
euBasqueXXX
barBavarianX
beBelarusianX
bnBengaliXXXX
bpyBishnupriya ManipuriX
bsBosnianX
brBretonX
bgBulgarianXX
myBurmeseXX
caCatalanXXX
cebCebuanoX
ceChechenX
zhChinese (Simplified)XXXX
zhChinese (Traditional)XXXX
cvChuvashX
hrCroatianXX
csCzechXXX
daDanishXXXX
nlDutchXXXX
enEnglishXXXXX
etEstonianXX
fiFinnishXXXX
frFrenchXXXXX
glGalicianXXX
kaGeorgianXX
deGermanXXXX
elGreekXXXX
guGujaratiX
htHaitianX
heHebrewXX
hiHindiXXXX
huHungarianXXXX
isIcelandicXX
ioIdoX
idIndonesianXXXX
gaIrishXXXX
itItalianXXXX
jaJapaneseXXXX
jvJavaneseX
knKannadaXX
kkKazakhX
kyKirghizX
koKoreanXXXX
laLatinX
lvLatvianX
ltLithuanianXXX
lmoLombardX
ndsLow SaxonX
lbLuxembourgishX
mkMacedonianX
mgMalagasyX
msMalayXX
mlMalayalamXX
mrMarathiX
minMinangkabauX
mnMongolianXX
neNepaliXXXX
newNewarX
nbNorwegian (Bokmål)XXXX
nnNorwegian (Nynorsk)X
ocOccitanX
faPersian (Farsi)XXX
pmsPiedmonteseX
plPolishXXXX
ptPortugueseXXXXX
paPunjabiX
roRomanianXXXX
ruRussianXXXX
scoScotsX
srSerbianXXX
hbsSerbo-CroatianX
scnSicilianX
skSlovakXX
slSlovenianXXX
azSouth AzerbaijaniX
esSpanishXXXXX
suSundaneseX
swSwahiliXX
svSwedishXXXX
tlTagalogXXX
tgTajikX
taTamilXXXX
ttTatarX
teTeluguX
thThaiXXXX
trTurkishXXXX
ukUkrainianXXXX
urUrduX
uzUzbekX
viVietnameseXX
voVolapükX
warWaray-WarayX
cyWelshX
fyWest FrisianX
paWestern PunjabiX
yoYorubaX

Sentiment Analysis

LanguageAFINNSenticonPattern
Arabic (ar)X
Armenian (hy)X
Basque (eu)X
Bengali (bn)X
Catalan (ca)X
Czech (cs)X
Danish (da)X
Dutch (nl)X
English (en)XXX
Finnish (fi)X
French (fr)X
Galician (gl)X
German (de)X
Greek (el)X
Hindi (hi)X
Hungarian (hu)X
Indonesian (id)X
Irish (ga)X
Italian (it)X
Korean (ko)X
Lithuanian (lt)X
Nepali (ne)X
Norwegian (no)X
Persian (Farsi) (fa)X
Polish (pl)X
Portuguese (pt)X
Romanian (ro)X
Russian (ru)X
Serbian (sr)X
Slovenian (sl)X
Spanish (es)XX
Swedish (sv)X
Tagalog (tl)X
Tamil (ta)X
Thai (th)X
Turkish (tr)X
Ukrainian (uk)X

Comparision with other NLP products

LocaleLanguageMicrosoft LUISGoogle DialogflowSAP Conversational AIAmazon LEXIBM WatsonNLP.js
afAfrikaansX
sqAlbanianX
arArabicXXXX
anAragoneseX
hyArmenianX
astAsturianX
azAzerbaijaniX
baBashkirX
euBasqueX
barBavarianX
beBelarusianX
bnBengaliX
bpyBishnupriya ManipuriX
bsBosnianX
brBretonX
bgBulgarianX
myBurmeseX
caCatalanXX
cebCebuanoX
ceChechenX
zhChinese (Simplified)XXXXX
zhChinese (Traditional)XXXXX
cvChuvashX
hrCroatianX
csCzechXX
daDanishXXX
nlDutchXXXXX
enEnglishXXXXXX
etEstonianX
fiFinnishXX
frFrenchXXXXX
glGalicianX
kaGeorgianX
deGermanXXXXX
elGreekX
guGujaratiXX
htHaitianX
heHebrewX
hiHindiXXXX
huHungarianX
isIcelandicX
ioIdoX
idIndonesianXX
gaIrishX
itItalianXXXXX
jaJapaneseXXXXX
jvJavaneseX
knKannadaX
kkKazakhX
kyKirghizX
koKoreanXXXXX
laLatinX
lvLatvianX
ltLithuanianX
lmoLombardX
ndsLow SaxonX
lbLuxembourgishX
mkMacedonianX
mgMalagasyX
msMalayX
mlMalayalamX
mrMarathiXX
minMinangkabauX
mnMongolianX
neNepaliX
newNewarX
nbNorwegian (Bokmål)XXX
nnNorwegian (Nynorsk)X
ocOccitanX
faPersian (Farsi)X
pmsPiedmonteseX
plPolishXXX
ptPortugueseXXXXX
paPunjabiX
roRomanianX
ruRussianXXX
scoScotsX
srSerbianX
hbsSerbo-CroatianX
scnSicilianX
skSlovakX
slSlovenianX
azSouth AzerbaijaniX
esSpanishXXXXX
suSundaneseX
swSwahiliX
svSwedishXXX
tlTagalogX
tgTajikX
taTamilXX
ttTatarX
teTeluguXX
thThaiXX
trTurkishXXX
ukUkrainianXX
urUrduX
uzUzbekX
viVietnameseX
voVolapükX
warWaray-WarayX
cyWelshX
fyWest FrisianX
paWestern PunjabiX
yoYorubaX

Example with several languages

This example uses three languages, where one of the languages is Klingon, to show that NLP will work even with language support, because it will use the tokenizer but not the stemmers.

const { NlpManager } = require('../packages/node-nlp/src');

(async () => {
  const manager = new NlpManager({ languages: ['en', 'ko', 'kl'] });
  // Gives a name for the fantasy language
  manager.describeLanguage('kl', 'Klingon');
  // Train Klingon
  manager.addDocument('kl', 'nuqneH', 'hello');
  manager.addDocument('kl', 'maj po', 'hello');
  manager.addDocument('kl', 'maj choS', 'hello');
  manager.addDocument('kl', 'maj ram', 'hello');
  manager.addDocument('kl', `nuqDaq ghaH ngaQHa'moHwI'mey?`, 'keys');
  manager.addDocument('kl', `ngaQHa'moHwI'mey lujta' jIH`, 'keys');
  // Train Korean
  manager.addDocument('ko', '여보세요', 'greetings.hello');
  manager.addDocument('ko', '안녕하세요!', 'greetings.hello');
  manager.addDocument('ko', '여보!', 'greetings.hello');
  manager.addDocument('ko', '어이!', 'greetings.hello');
  manager.addDocument('ko', '좋은 아침', 'greetings.hello');
  manager.addDocument('ko', '안녕히 주무세요', 'greetings.hello');
  manager.addDocument('ko', '안녕', 'greetings.bye');
  manager.addDocument('ko', '친 공이 타자', 'greetings.bye');
  manager.addDocument('ko', '상대가 없어 남는 사람', 'greetings.bye');
  manager.addDocument('ko', '지엽적인 것', 'greetings.bye');
  manager.addDocument('en', 'goodbye for now', 'greetings.bye');
  manager.addDocument('en', 'bye bye take care', 'greetings.bye');
  manager.addDocument('en', 'okay see you later', 'greetings.bye');
  manager.addDocument('en', 'bye for now', 'greetings.bye');
  manager.addDocument('en', 'i must go', 'greetings.bye');
  manager.addDocument('en', 'hello', 'greetings.hello');
  manager.addDocument('en', 'hi', 'greetings.hello');
  manager.addDocument('en', 'howdy', 'greetings.hello');

  // Train also the NLG
  manager.addAnswer('en', 'greetings.bye', 'Till next time');
  manager.addAnswer('en', 'greetings.bye', 'see you soon!');
  manager.addAnswer('en', 'greetings.hello', 'Hey there!');
  manager.addAnswer('en', 'greetings.hello', 'Greetings!');

  // Train and save the model.
  await manager.train();
  manager.save();

  // English and Korean can be automatically detected
  manager.process('I have to go').then(console.log);
  manager.process('상대가 없어 남는 편').then(console.log);
  // For Klingon, as it cannot be automatically detected, 
  // you must provide the locale
  manager.process('kl', `ngaQHa'moHwI'mey nIH vay'`).then(console.log);
})();