compound-word-splitter

March 29, 2018 ยท View on GitHub

.. image:: https://travis-ci.org/TimKam/compound-word-splitter.svg?branch=master :target: https://travis-ci.org/TimKam/compound-word-splitter

Splits words that are not recognized by pyenchant (spell checker) into largest possible compounds.

Installation

Make sure you have enchant <https://www.abisource.com/projects/enchant/>_ installed before proceeding.

Now run ::

pip install compound-word-splitter

Note that the languages that are available by default depend on your operating system's configuration and could be, for example::

['en', 'en_CA', 'en_GB', 'en_US']

If you would like to use a different language, like de_de in the example below, you will have to install the myspell <http://www.openoffice.org/lingucomponent/dictionary.html/>_ dictionary for it (myspell-de-de).

Usage

.. code:: python

import splitter

splitter.split('artfactory')

returns

.. code:: python

['art', 'factory']

.

.. code:: python

split('Glossarelement', 'de_de')

returns

.. code:: python

['Glossar', 'Element']

.

If the word cannot be split into compounds pyenchant recognizes as words, the splitter returns an empty string.