Lunr Languages

May 8, 2026 · View on GitHub

Used by 18k+ projects • ~300k weekly downloads

Lunr Languages is an extension for Lunr.js that enables fast, multilingual full-text search across dozens of languages — in the browser or Node.js.

Originally built for classic search, it is now widely used as a lightweight retrieval layer in AI systems, including:

Retrieval-Augmented Generation (RAG)
Hybrid search (keyword + vector)
Local-first / edge AI apps
Static site search and documentation search

⭐ If this project saves you time or powers something important, consider starring it or supporting its maintenance.

Supported Languages

German
French
Spanish²
Italian
Dutch
Danish
Portuguese
Polish
Finnish
Romanian
Hungarian
Russian
Norwegian
Swedish
Turkish
Japanese
Thai
Arabic
Chinese¹
Vietnamese
Sanskrit
Kannada
Telugu
Hindi
Tamil
Korean
Armenian
Hebrew
Greek

→ Contribute a new language

¹ Chinese tokenization uses Intl.Segmenter with CJK bigrams by default, which works in modern browsers and Node.js without native dependencies. In Node.js, if @node-rs/jieba is installed, Lunr Languages uses it automatically for higher-quality Jieba segmentation. Browsers must support Intl.Segmenter; there is no frontend fallback.

² Spanish includes an opt-in lunr.es.accentFold pipeline function for Lunr 2 indexes that should match user queries with omitted accents, such as respiracion matching Respiración, without replacing the default Spanish stemmer.

Why Lunr Languages in an AI world?

Modern AI systems don’t replace search — they depend on it.

Before an LLM can generate an answer, it needs relevant context. That’s where Lunr Languages fits:

🔎 Fast and consistent lexical retrieval

Filter thousands of documents down to a small candidate set before embedding or reranking.

🌍 Multilingual support out of the box

Tokenization, stemming, and stopwords for 30+ languages — still a hard problem in AI pipelines.

⚡ Zero infrastructure

Runs entirely in the browser or Node.js. No vector DB required.

🔒 Privacy-friendly / offline-ready

Perfect for:

in-browser AI assistants
local knowledge bases
on-device search

Example: Hybrid Search (Keyword + AI)

User query
→ Lunr (keyword search, multilingual)
→ top 100–500 documents
→ embeddings / reranker
→ LLM generates answer

Lunr Languages improves recall and precision, especially for:

non-English content
inflected languages
mixed-language datasets

Installation

npm install lunr-languages

Usage

Basic example (German)

const lunr = require('lunr');
require('lunr-languages/lunr.stemmer.support')(lunr);
require('lunr-languages/lunr.de')(lunr);

const idx = lunr(function () {
  this.use(lunr.de);

  this.field('title', { boost: 10 });
  this.field('body');

  this.add({ title: 'Dokument', body: 'Beispieltext' });
});

For Spanish indexes on Lunr 2, you can opt into accent-insensitive matching by expanding accented tokens before stemming:

require('lunr-languages/lunr.stemmer.support')(lunr);
require('lunr-languages/lunr.es')(lunr);

var idx = lunr(function () {
  this.use(lunr.es);
  this.pipeline.before(lunr.es.stemmer, lunr.es.accentFold);
  this.searchPipeline.before(lunr.es.stemmer, lunr.es.accentFold);

  this.field('body');
  this.add({ id: 1, body: 'Respiración' });
});

Multi-language indexing

require('lunr-languages/lunr.multi')(lunr);

const idx = lunr(function () {
  this.use(lunr.multiLanguage('en', 'ru', 'de'));

  this.field('title');
  this.field('body');
});

Chinese Tokenization

Chinese support is designed to work without mandatory native binaries:

In browsers, lunr.zh uses Intl.Segmenter plus CJK bigrams. If Intl.Segmenter is unavailable, it logs an error and throws because there is no bundled browser fallback.
In Node.js, lunr.zh first tries to load @node-rs/jieba. If it is installed, it is used for better Chinese segmentation. If it is not installed, Lunr Languages logs an informational message and falls back to Intl.Segmenter plus CJK bigrams.
If neither @node-rs/jieba nor Intl.Segmenter is available in Node.js, Chinese tokenization logs an error and throws.

The Intl.Segmenter fallback avoids native package supply-chain risk and works well for lightweight search, but it is not identical to Jieba. Bigrams improve recall for common two-character search terms such as 车主 and 学姐, while Jieba generally provides better precision and ranking for serious Chinese search.

To opt into Jieba tokenization in Node.js: