The Natural Language Toolkit ============================ This is some local documentation for NLTK 3.3. The `official documentation `__ is pretty extensive, so I'm initially only going to document parts of the NLTK as I encounter them and maybe add some notes to help me remember things about them. Data ---- This is the ``nltk.data`` module. Load ~~~~ The :func:`nltk.data.load` function loads various nltk items for you. .. module:: nltk.data .. autosummary:: :toctree: autogenerated load Tokenizers ---------- Punkt ~~~~~ The ``PunktSentenceTokenizer`` tokenizes a text into sentences using an unsupervised model. Because it needs to be trained before being used you normally load it using the :func:`nltk.data.load` function which loads a pre-trained model in the language you specify. .. module:: nltk.tokenize.punkt .. autosummary:: :toctree: autogenerated PunktSentenceTokenizer PunktSentenceTokenizer.tokenize To load an English-language tokenizer using the :func:`nltk.data.load` function you pass it the path to the pickle file. .. code:: tokenizer = nltk.data.load("tokenizers/punkt/PY3/english.pickle") Then to use it you call its :meth:`nltk.tokenize.punkt.PunktSentenceTokenizer.tokenize` method. .. code:: sentences = tokenizer.tokenize(source)