Tokenize UK

May 29, 2016 ยท View on GitHub

=============================== Tokenize UK

.. image:: https://img.shields.io/pypi/v/tokenize_uk.svg :target: https://pypi.python.org/pypi/tokenize_uk

.. image:: https://img.shields.io/travis/lang-uk/tokenize-uk.svg :target: https://travis-ci.org/lang-uk/tokenize-uk

.. image:: http://readthedocs.org/projects/tokenize-uk/badge/?version=latest :target: http://tokenize-uk.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status

Simple python lib to tokenize texts into sentences and sentences to words. Small, fast and robust. Comes with ukrainian flavour

Features

  • Tokenize given text into sentences
  • Tokenize given sentence into words
  • Works well with accented characters (like stresses) and apostrophes
  • Suitable also for other languages