README
May 29, 2013 ยท View on GitHub
langid-java
A somewhat changed Java port of langid.py (https://github.com/saffsd/langid.py)
Usage
Fetch a JAR from Maven repositories (or compile it yourself). Then check out the javadocs of ILangIdClassifier and LangIdV3.
Memory and Speed
Don't get fooled by the size of the JAR archive. The data model is LZMA compressed and will take about ~10MB of RAM. Speed wise this implementation should be faster than anything else out there; if you have very large texts you can sub-sample, append those fragments and classify without processing the entire content.
Quality
At the moment the detection/ performance quality is identical to langid.py (the model and the math code is identical, even if written in a slightly different way to speed up computations in Java).