The Natural Language Toolkit
============================

This is some local documentation for NLTK 3.3. The `official documentation <https://www.nltk.org/>`__ is pretty extensive, so I'm initially only going to document parts of the NLTK as I encounter them and maybe add some notes to help me remember things about them.

Data
----

This is the ``nltk.data`` module.

Load
~~~~

The :func:`nltk.data.load` function loads various nltk items for you.

.. module:: nltk.data

.. autosummary::
   :toctree: autogenerated

   load


Tokenizers
----------

Punkt
~~~~~

The ``PunktSentenceTokenizer`` tokenizes a text into sentences using an unsupervised model. Because it needs to be trained before being used you normally load it using the :func:`nltk.data.load` function which loads a pre-trained model in the language you specify.

.. module:: nltk.tokenize.punkt
.. autosummary::
   :toctree: autogenerated

    PunktSentenceTokenizer
    PunktSentenceTokenizer.tokenize

To load an English-language tokenizer using the :func:`nltk.data.load` function you pass it the path to the pickle file.

.. code::

   tokenizer = nltk.data.load("tokenizers/punkt/PY3/english.pickle")

Then to use it you call its :meth:`nltk.tokenize.punkt.PunktSentenceTokenizer.tokenize` method.

.. code::

   sentences = tokenizer.tokenize(source)