Stanford CoreNLP Stopword Plugin
February 3, 2017 · View on GitHub
This is an extension to the Stanford CoreNLP analytics pipeline to check if a token's word and lemma value are stopwords.
Obtaining
See dependencies
Fork
This is originally John Conwell's coreNlp extensions library. I've updated the dependencies in the POM and only kept the stopword plugin and changed the maven coordinate so I can deploy it to Maven Central (under my account).
Identifying Stopwords in CoreNlp
By default, the StopwordAnnotator uses the built in Lucene stopword list, but you have to option to pass in a custom list of stopwords for it to use instead. You can also specify if the StopwordAnnotator should check the lemma of the token against the stopword list or not.
For examples of how to use the StopwordAnnotator, take a look at StopwordAnnotatorTest.java
Documentation
More documentation:
Changelog
An extensive changelog is available here.
Authors
John Conwell (original author) Paul Landes (maintainer)
License
Copyright © 2016 - 2017 Paul Landes
Apache License Version 2.0