Named Entity Recognition

Named Entity Recognition (NER)

We'll start with the question - "What is Named Entity Recognition (NER)?". NER is a subtask of information extraction that locates and classifies named entities in a text. The named entities could be organizations, persons, locations, times, etc.

We'll train a named entity recognition system that could be trained in a few seconds (on a GPU) and will get around 75% accuracy. Then we'll load in the exact version of the model, which was trained for a longer period of time. We can then evaluate the trained version of the model to get 96% accuracy! Finally, we'll test the named entity recognition system with new sentences.

The Posts In Order

NER: Pre-Processing the Data

Preprocessing The Data

We will be using a dataset from Kaggle which appears to have originally come from the Groningen Meaning Bank (a bank of texts, not money). The original data consists of four columns, the sentence number, the word, the part of speech of the word, and the tags. A few tags you might expect to see are:

  • geo: geographical entity
  • org: organization
  • per: person
  • gpe: geopolitical entity
  • tim: time indicator
  • art: artifact
  • eve: event
  • nat: natural phenomenon
  • O: filler word

Imports

# python
from collections import namedtuple
from pathlib import Path

import os

# pypi
from dotenv import load_dotenv
from expects import equal, expect
from sklearn.model_selection import train_test_split
from tabulate import tabulate

import pandas

Set Up

The Dataset

Note: to get the encoding for the file use file:

file -bi ner_dataset.csv

In this case we get:

application/csv; charset=iso-8859-1

Since it isn't ASCII or ISO-8 we'll have to tell pandas what the encoding is.

load_dotenv("posts/nlp/.env", override=True)
path = Path(os.environ["NER_DATASET"]).expanduser()
data = pandas.read_csv(path, encoding="ISO-8859-1")

Middle

The Kaggle Data

print(tabulate(data.iloc[:5], tablefmt="orgtbl", headers="keys"))
  Sentence # Word POS Tag
0 Sentence: 1 Thousands NNS O
1 nan of IN O
2 nan demonstrators NNS O
3 nan have VBP O
4 nan marched VBN O

As you can (kind of) tell, the sentences are broken up so that each row has one word in it.

To make it easier to work with I'm going to rename the columns.

data = data.rename(columns={"Sentence #":"sentence", "Word": "word", "Tag": "tag"})

Words and Tags

The first thing we're going to do is separate out the words to build our vocabulary. The vocabulary will be a mapping of each word to an index so that we can convert our text to numbers for our model. In addition we're going to add a <PAD> token so that if our input is to short we can pad it to be the right size. And an UNK token in case we don't know a word.

token = namedtuple("Token", "pad unknown".split())
Token = token(pad="<PAD>", unknown="UNK")
vocabulary = {word: index for index, word in enumerate(data.word.unique())}
vocabulary[Token.pad] = len(vocabulary)
vocabulary[Token.unknown] = len(vocabulary)
print(f"{len(vocabulary):,}")
35,180

We're going to do the same with the Tag column.

tags = {tag: index for index, tag in enumerate(data.tag.unique())}
print(tags)
{'O': 0, 'B-geo': 1, 'B-gpe': 2, 'B-per': 3, 'I-geo': 4, 'B-org': 5, 'I-org': 6, 'B-tim': 7, 'B-art': 8, 'I-art': 9, 'I-per': 10, 'I-gpe': 11, 'I-tim': 12, 'B-nat': 13, 'B-eve': 14, 'I-eve': 15, 'I-nat': 16}

Note: This is actually cheating because I am using the whole dataset. Later on make sure to only use the training data.

Sentences and Labels

We're also going to need to smash the words back into sentences. There's probably a clever pandas way to do this, but I'll just brute-force it. We'll also need to join the labels for the sentences into strings.

sentences = []
labels = []
sentence = None
for row in data.itertuples():
    if not pandas.isna(row.sentence):
        if sentence:
            sentences.append(sentence)
            labels.append(label)
        sentence = [row.word]
        label = [row.tag]
    else:
        sentence.append(row.word)
        label.append(row.tag)
print(f"{len(sentences):,}")
print(f"{len(labels):,}")
print(sentences[0])
print(labels[0])
47,958
47,958
['Thousands', 'of', 'demonstrators', 'have', 'marched', 'through', 'London', 'to', 'protest', 'the', 'war', 'in', 'Iraq', 'and', 'demand', 'the', 'withdrawal', 'of', 'British', 'troops', 'from', 'that', 'country', '.']
['O', 'O', 'O', 'O', 'O', 'O', 'B-geo', 'O', 'O', 'O', 'O', 'O', 'B-geo', 'O', 'O', 'O', 'O', 'O', 'B-gpe', 'O', 'O', 'O', 'O', 'O']

We're going to convert them to numbers so I didn't join them into strings.

To Numbers

sentence_vectors = [
    [vocabulary.get(word, Token.unknown) for word in sentence]
    for sentence in sentences
]

assert len(sentence_vectors) == len(sentences)
print(sentence_vectors[0])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 9, 15, 1, 16, 17, 18, 19, 20, 21]
label_vectors = [
    [tags[label] for label in sentence_labels] for sentence_labels in labels
]
assert len(label_vectors) == len(labels)
print(label_vectors[0])
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0]

In this case we're assuming that there's no unknown tags because they are only used for training and testing so we wouldn't expect to see one that isn't in our current dataset, unlike the sentences which are going to be used with new data and so might have tokens we haven't seen before.

We could add the padding here, but instead we're going to do it in the batch generator.

The Train-Test Split

This time we're going to do a real train-validation-test split.

splits = namedtuple("Split", "train validation test".split())
Split = splits(train=33570, validation=7194, test=7194)
x_train, x_leftovers, y_train, y_leftovers = train_test_split(sentences, labels, train_size=Split.train)
x_validation, x_test, y_validation, y_test = train_test_split(x_leftovers, y_leftovers, test_size=Split.test)

assert len(x_train) == Split.train
assert len(y_train) == Split.train
assert len(x_validation) == Split.validation
assert len(y_validation) == Split.validation
assert len(x_test) == Split.test
assert len(y_test) == Split.test

Bundling This Up

Imports

# python
from collections import namedtuple
from functools import partial
from pathlib import Path

import os

# pypi
from dotenv import load_dotenv
from sklearn.model_selection import train_test_split

import attr
import pandas

Some Constants

Read = namedtuple("Read", "dotenv key encoding".split())
READ = Read(dotenv="posts/nlp/.env", key="NER_DATASET",
            encoding="ISO-8859-1")

COLUMNS={"Sentence #":"sentence",
         "Word": "word",
         "Tag": "tag"}

Token = namedtuple("Token", "pad unknown".split())
TOKEN = Token(pad="<PAD>", unknown="UNK")

Splits = namedtuple("Split", "train validation test".split())
SPLIT = Splits(train=33570, validation=7194, test=7194)

DataSets = namedtuple("DataSets", [
    "x_train",
    "y_train",
    "x_validate",
    "y_validate",
    "x_test",
    "y_test"
])

TheData = namedtuple("TheData", [
    "vocabulary",
    "tags",
    "data_sets",
    "raw_data_sets",
])

The Data Processor

Each of the three sets needs to be vectorized since I'm not saving the sentences beforehand. So this class handles that.

@attr.s(auto_attribs=True)
class DataFlattener:
    """Converts the kaggle data to sentences and labels

    Args:
     data: the data to convert
    """
    data: pandas.DataFrame
    _sentences: list=None
    _labels: list=None
  • Sentences
    @property
    def sentences(self) -> list:
        """List of sentences from the data"""
        if self._sentences is None:
            self.set_sentences_and_labels()
        return self._sentences
    
  • Labels
    @property
    def labels(self) -> list:
        """List of labels from the data"""
        if self._labels is None:
            self.set_sentences_and_labels()
        return self._labels
    
  • Sentences and Labels maker
    def set_sentences_and_labels(self) -> None:
        """Converts the data to lists
        of sentence token lists and also sets the labels
        """
        self._sentences = []
        self._labels = []
        sentence = None
        for row in self.data.itertuples():
            if not pandas.isna(row.sentence):
                if sentence:
                    self._sentences.append(sentence)
                    self._labels.append(labels)
                sentence = [row.word]
                labels = [row.tag]
            else:
                sentence.append(row.word)
                labels.append(row.tag)
        return
    

Data Vectorizer

@attr.s(auto_attribs=True)
class DataVectorizer:
    """Converts the data-set strings to vectors

    Args:
     data_sets: the split up data sets
     vocabulary: map from token to index
     tags: map from tag to index
    """
    data_sets: namedtuple
    vocabulary: dict
    tags: dict
    _vectorized_datasets: namedtuple=None
  • Vectorized Data Sets
    @property
    def vectorized_datasets(self) -> namedtuple:
        """the original data sets converted to indices"""
        if self._vectorized_datasets is None:
            sentence_vectors = partial(self.to_vectors,
                                       to_index=self.vocabulary)
            label_vectors = partial(self.to_vectors,
                                    to_index=self.tags)
            self._vectorized_datasets = DataSets(
                x_train = sentence_vectors(self.data_sets.x_train),
                y_train = label_vectors(self.data_sets.y_train),
                x_validate = sentence_vectors(self.data_sets.x_validate),
                y_validate = label_vectors(self.data_sets.y_validate),
                x_test = sentence_vectors(self.data_sets.x_test),
                y_test = label_vectors(self.data_sets.y_test),
            )
        return self._vectorized_datasets
    
  • Sentence Vectors
    def to_vectors(self, source: list, to_index: dict) -> list:
        """Sentences converted to Integers
    
        Args:
         source: iterator of tokenized strings to convert
         to_index: map to convert the tokens to indices
    
        Returns:
         tokens in source converted to indices
        """
        vectors = [
                [to_index.get(token, TOKEN.unknown)
                 for token in line]
                for line in source
            ]
        assert len(vectors) == len(source)
        return vectors
    

The Splitter

@attr.s(auto_attribs=True)
class DataSplitter:
    """Splits up the training, testing, etc.

    Args:
     split: constants with the train, test counts
     sentences: input data to split
     labels: y-data to split
     random_state: seed for the splitting
    """
    split: namedtuple
    sentences: list
    labels: list    
    random_state: int=None
    _data_sets: namedtuple=None
  • Data Sets
    @property
    def data_sets(self) -> namedtuple:
        """The Split data sets"""
        if self._data_sets is None:
            x_train, x_leftovers, y_train, y_leftovers = train_test_split(
                self.sentences, self.labels,
                train_size=self.split.train,
                random_state=self.random_state)
            x_validate, x_test, y_validate, y_test = train_test_split(
                x_leftovers,
                y_leftovers,
                test_size=self.split.test,
                random_state=self.random_state)
            self._data_sets = DataSets(x_train=x_train,
                                       y_train=y_train,
                                       x_validate=x_validate,
                                       y_validate=y_validate,
                                       x_test=x_test,
                                       y_test=y_test,
                                       )
            assert len(x_train) + len(x_validate) + len(x_test) == len(self.sentences)
        return self._data_sets
    

The Loader

@attr.s(auto_attribs=True)
class DataLoader:
    """Loads and converts the kaggle data

    Args:
      read: the stuff to download the data
    """
    read: namedtuple=READ    
    _data: pandas.DataFrame=None
    _vocabulary: dict=None
    _tags: dict=None
  • The Kaggle Data
    @property
    def data(self) -> pandas.DataFrame:
        """The original kaggle dataset"""
        if self._data is None:
            load_dotenv(self.read.dotenv)
            path = Path(os.environ[self.read.key]).expanduser()
            self._data = pandas.read_csv(path, encoding=self.read.encoding)
            self._data = self._data.rename(columns=COLUMNS)
        return self._data
    
  • The Vocabulary
    @property
    def vocabulary(self) -> dict:
        """map of word to index
    
        Note:
          This is creating a transformation of the entire data-set
        so it comes before the train-test-split so it uses the whole
        dataset, not just training
        """
        if self._vocabulary is None:
            self._vocabulary = {
                word: index
                for index, word in enumerate(self.data.word.unique())}
            self._vocabulary[TOKEN.pad] = len(self._vocabulary)
            self._vocabulary[TOKEN.unknown] = len(self._vocabulary)
        return self._vocabulary
    
  • The Tags
    @property
    def tags(self) -> dict:
        """map of tag to index"""
        if self._tags is None:
            self._tags = {tag: index for index, tag in enumerate(
                self.data.tag.unique())}
            self._tags[TOKEN.unknown] = len(self._tags)
        return self._tags
    

The Processor

@attr.s(auto_attribs=True)
class NERData:
    """Master NER Data preparer

    Args:
     read_constants: stuff to help load the dataset
     split_constants: stuff to help split the dataset
     random_state: seed for the splitting
    """
    read_constants: namedtuple=READ
    split_constants: namedtuple=SPLIT
    random_state: int=33
    _data: namedtuple=None
    _loader: DataLoader=None
    _flattener: DataFlattener=None
    _splitter: DataSplitter=None
    _vectorizer = DataVectorizer=None
  • The Data
    @property
    def data(self) -> namedtuple:
        """The split up data sets"""
        if self._data is None:
            self._data = TheData(
                vocabulary=self.loader.vocabulary,
                tags=self.loader.tags,
                raw_data_sets=self.splitter.data_sets,
                data_sets=self.vectorizer.vectorized_datasets,
            )
        return self._data
    
  • The Loader
    @property
    def loader(self) -> DataLoader:
        """The loader of the data"""
        if self._loader is None:
            self._loader = DataLoader(
                read=self.read_constants,            
            )
        return self._loader
    
  • The Flattener
    @property
    def flattener(self) -> DataFlattener:
        """The sentence and label builder"""
        if self._flattener is None:
            self._flattener = DataFlattener(
                data=self.loader.data,
            )
        return self._flattener
    
  • The Splitter
    @property
    def splitter(self) -> DataSplitter:
        """The splitter upper for the data"""
        if self._splitter is None:
            self._splitter = DataSplitter(
                split=self.split_constants,
                sentences = self.flattener.sentences,
                labels = self.flattener.labels,
                random_state=self.random_state
            )
        return self._splitter
    
  • The Vectorizer
    @property
    def vectorizer(self) -> DataVectorizer:
        """Vectorizes the raw-data sets"""
        if self._vectorizer is None:
            self._vectorizer = DataVectorizer(
                data_sets=self.splitter.data_sets,
                tags=self.loader.tags,
                vocabulary=self.loader.vocabulary
            )
        return self._vectorizer
    

Testing It Out

from neurotic.nlp.named_entity_recognition import NERData

ner = NERData()

expect(len(ner.data.data_sets.x_train)).to(equal(Split.train))
expect(len(ner.data.data_sets.x_validate)).to(equal(Split.validation))
expect(len(ner.data.data_sets.x_test)).to(equal(Split.test))

RNNS and Vanishing Gradients

Vanishing Gradients

This will be a look at the problem of vanishing gradients from an intuitive standpoint.

Background

Adding layers to a neural network introduces multiplicative effects in both forward and backward propagation. The back-prop in particular presents a problem as the gradient of activation functions can be very small. Multiplied together across many layers, their product can be vanishingly small. This results in weights not being updated in the front layers and training not progressing.

Gradients of the sigmoid function, for example, are in the range 0 to 0.25. To calculate gradients for the front layers of a neural network the chain rule is used. This means that these tiny values are multiplied starting at the last layer, working backwards to the first layer, with the gradients shrinking exponentially at each step.

Imports

# python
from collections import namedtuple
from functools import partial

# pypi
import holoviews
import hvplot.pandas
import numpy
import pandas

# another project
from graeae import EmbedHoloviews

Set Up

SLUG = "rnns-and-vanishing-gradients"
Embed = partial(EmbedHoloviews,
                folder_path=f"files/posts/nlp/{SLUG}")
Plot = namedtuple("Plot", ["width", "height", "fontscale", "tan", "blue", "red"])
PLOT = Plot(
    width=900,
    height=750,
    fontscale=2,
    tan="#ddb377",
    blue="#4687b7",
    red="#ce7b6d",
 )

Middle

The Data

This will be an evenly spaced set of points over an interval (see numpy.linspace).

STOP, STEPS = 10, 100
x = numpy.linspace(-STOP, STOP, STEPS)

The Sigmoid

Our activation function will be the sigmoid (wikipedia link) (well, the logistic function).

def sigmoid(x: numpy.ndarray) -> numpy.ndarray:
    return 1 / (1 + numpy.exp(-x))

Now we'll calculate the activations for our input data.

activations = sigmoid(x)

The Gradient

Our gradient is the derivative of the sigmoid.

def gradient(x: numpy.ndarray) -> numpy.ndarray:
    return (x) * (1 - x)

Now we can get the gradients for our activations.

gradients = gradient(activations)

Plotting the Sigmoid

tangent_x = 0
tangent_y = sigmoid(tangent_x)
span = 2

gradient_tangent = gradient(sigmoid(tangent_x))

tangent_plot_x = numpy.linspace(tangent_x - span, tangent_x + span, STEPS)
tangent_plot_y = tangent_y + gradient_tangent * (tangent_plot_x - tangent_x)

frame = pandas.DataFrame.from_dict(
    {"X": x,
     "Sigmoid": activations,
     "X-Tangent": tangent_plot_x,
     "Y-Tangent": tangent_plot_y,
     "Gradient": gradients})
plot = (frame.hvplot(x="X", y="Sigmoid").opts(color=PLOT.blue)
        * frame.hvplot(x="X", y="Gradient").opts(color=PLOT.red)
        * frame.hvplot(x="X-Tangent",
                       y="Y-Tangent").opts(color=PLOT.tan)).opts(
            title="Sigmoid and Tangent",
            width=PLOT.width,
            height=PLOT.height,
            fontscale=PLOT.fontscale)
output = Embed(plot=plot, file_name="sigmoid_tangent")()
print(output)

Figure Missing

The thing to notice is that as the input data moves away from the center (at 0) the gradients get smaller in either direction, rapidly approaching zero.

The Numerical Impact

Multiplication & Decay

Multiplying numbers smaller than 1 results in smaller and smaller numbers. Below is an example that finds the gradient for an input x = 0 and multiplies it over n steps. Look how quickly it 'Vanishes' to almost zero. Yet \(\sigma(x=0) \implies 0.5\) which has a sigmoid gradient of 0.25 and that happens to be the largest sigmoid gradient possible.

A Decay Simulation

Input data

n = 6
x = 0

gradients = gradient(sigmoid(x))
steps = numpy.arange(1, n + 1)
print("-- Inputs --")
print("steps :", n)
print("x value :", x)
print("sigmoid :", "{:.5f}".format(sigmoid(x)))
print("gradient :", "{:.5f}".format(gradients), "\n")
-- Inputs --
steps : 6
x value : 0
sigmoid : 0.50000
gradient : 0.25000 

Plot The Decay

decaying_values = (numpy.ones(len(steps)) * gradients).cumprod()
data = pandas.DataFrame.from_dict(dict(Step=steps, Gradient=decaying_values))
plot = data.hvplot(x="Step", y="Gradient").opts(
    title="Cumulative Gradient",
    width=PLOT.width,
    height=PLOT.height,
    fontscale=PLOT.fontscale
)
output = Embed(plot=plot, file_name="cumulative_gradient")()
print(output)

Figure Missing

The point being that the gradients very quickly approach zero.

So, How Do You Fix This?

One solution is to use activation functions that don't have tiny gradients. Other solutions involve more sophisticated model design. But they're both discussions for another time.

Deep N-Grams: Batch Generation

Generating Batches of Data

Most of the time in Natural Language Processing, and AI in general we use batches when training our data sets. Here, you will build a data generator that takes in a text and returns a batch of text lines (lines are sentences).

  • The generator converts text lines (sentences) into numpy arrays of integers padded by zeros so that all arrays have the same length, which is the length of the longest sentence in the entire data set.

This generator returns the data in a format that you could directly use in your model when computing the feed-forward pass of your algorithm. This iterator returns a batch of lines and a per-token mask. The batch is a tuple of three parts: inputs, targets, and mask. The inputs and targets are identical. The second column will be used to evaluate your predictions. Mask is 1 for non-padding tokens.

Imports

# python
from itertools import cycle
import random

# from pypi
from expects import be_true, expect
import trax.fastmath.numpy as numpy

# this project
from neurotic.nlp.deep_rnn.data_loader import DataLoader

Set Up

The DataLoader

data_loader = DataLoader()

Middle

The Data Generator

  • While True loop: this will yield one batch at a time.
  • if index >= num_lines, set index to 0.
  • The generator should return shuffled batches of data. To achieve this without modifying the actual lines a list containing the indexes of data_lines` is created. This list can be shuffled and used to get random batches everytime the index is reset.
  • if len(line) < max_length append line to cur_batch.
    • Note that a line that has length equal to max_length should not be appended to the batch.
    • This is because when converting the characters into a tensor of integers, an additional end of sentence token id will be added.
    • So if max_length is 5, and a line has 4 characters, the tensor representing those 4 characters plus the end of sentence character will be f length 5, which is the max length.
  • if len(cur_batch) == batch_size, go over every line, convert it to an int and store it.

Remember that when calling np you are really calling trax.fastmath.numpy which is trax’s version of numpy that is compatible with JAX. As a result of this, where you used to encounter the type numpy.ndarray now you will find the type jax.interpreters.xla.DeviceArray.

Hints:

  • Use the line_to_tensor function above inside a list comprehension in order to pad lines with zeros.
  • Keep in mind that the length of the tensor is always 1 + the length of the original line of characters. Keep this in mind when setting the padding of zeros.

To get it to pass you'll have to pass in the to-tensor method of the DataLoader so we'll need to alias it to match their definition.

line_to_tensor = data_loader.to_tensor

Implementing the Generator

def data_generator(batch_size: int, max_length: int, data_lines: list,
                   line_to_tensor=line_to_tensor, shuffle: bool=True):
    """Generator function that yields batches of data

    Args:
       batch_size (int): number of examples (in this case, sentences) per batch.
       max_length (int): maximum length of the output tensor.
       NOTE: max_length includes the end-of-sentence character that will be added
               to the tensor.  
               Keep in mind that the length of the tensor is always 1 + the length
               of the original line of characters.
       data_lines (list): list of the sentences to group into batches.
       line_to_tensor (function, optional): function that converts line to tensor. Defaults to line_to_tensor.
       shuffle (bool, optional): True if the generator should generate random batches of data. Defaults to True.

    Yields:
       tuple: two copies of the batch (jax.interpreters.xla.DeviceArray) and mask (jax.interpreters.xla.DeviceArray).
       NOTE: jax.interpreters.xla.DeviceArray is trax's version of numpy.ndarray
    """
    # initialize the index that points to the current position in the lines index array
    index = 0

    # initialize the list that will contain the current batch
    cur_batch = []

    # count the number of lines in data_lines
    num_lines = len(data_lines)

    # create an array with the indexes of data_lines that can be shuffled
    lines_index = [*range(num_lines)]

    # shuffle line indexes if shuffle is set to True
    if shuffle:
        random.shuffle(lines_index)

    while True:

        # if the index is greater or equal than to the number of lines in data_lines
        if index >= num_lines:
            # then reset the index to 0
            index = 0
            # shuffle line indexes if shuffle is set to True
            if shuffle:
                random.shuffle(lines_index)

        # get a line at the `lines_index[index]` position in data_lines
        line = data_lines[lines_index[index]]

        # if the length of the line is less than max_length
        if len(line) < max_length:
            # append the line to the current batch
            cur_batch.append(line)

        # increment the index by one
        index += 1

        # if the current batch is now equal to the desired batch size
        if len(cur_batch) == batch_size:

            batch = []
            mask = []

            # go through each line (li) in cur_batch
            for li in cur_batch:
                # convert the line (li) to a tensor of integers
                tensor = line_to_tensor(li)

                # Create a list of zeros to represent the padding
                # so that the tensor plus padding will have length `max_length`
                pad = [0] * (max_length - len(tensor))

                # combine the tensor plus pad
                tensor_pad = tensor + pad

                # append the padded tensor to the batch
                batch.append(tensor_pad)

                # A mask for  tensor_pad is 1 wherever tensor_pad is not
                # 0 and 0 wherever tensor_pad is 0, i.e. if tensor_pad is
                # [1, 2, 3, 0, 0, 0] then example_mask should be
                # [1, 1, 1, 0, 0, 0]
                # Hint: Use a list comprehension for this
                example_mask = [int(item != 0) for item in tensor_pad]
                mask.append(example_mask)

            # convert the batch (data type list) to a trax's numpy array
            batch_np_arr = numpy.array(batch)
            mask_np_arr = numpy.array(mask)


            # Yield two copies of the batch and mask.
            yield batch_np_arr, batch_np_arr, mask_np_arr

            # reset the current batch to an empty list
            cur_batch = []

Try out the data generator.

tmp_lines = ['12345678901',
             '123456789',
             '234567890',
             '345678901']

Create a generator with a batch size of 2 and a maximum length of 10.

tmp_data_gen = data_generator(batch_size=2, 
                              max_length=10, 
                              data_lines=tmp_lines,
                              shuffle=False)

Get one batch.

tmp_batch = next(tmp_data_gen)

View the batch.

print(tmp_batch)

expected = (numpy.array([[49, 50, 51, 52, 53, 54, 55, 56, 57,  1],
                         [50, 51, 52, 53, 54, 55, 56, 57, 48,  1]]),
            numpy.array([[49, 50, 51, 52, 53, 54, 55, 56, 57,  1],
                         [50, 51, 52, 53, 54, 55, 56, 57, 48,  1]]),
            numpy.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                         [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]))
for index, batch in enumerate(tmp_batch):
    expect(bool((batch==expected[index]).all())).to(be_true)
(DeviceArray([[49, 50, 51, 52, 53, 54, 55, 56, 57,  1],
             [50, 51, 52, 53, 54, 55, 56, 57, 48,  1]], dtype=int32), DeviceArray([[49, 50, 51, 52, 53, 54, 55, 56, 57,  1],
             [50, 51, 52, 53, 54, 55, 56, 57, 48,  1]], dtype=int32), DeviceArray([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
             [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int32))

Now that you have your generator, you can just call them and they will return tensors which correspond to your lines in Shakespeare. The first column and the second column are identical. Now you can go ahead and start building your neural network.

Repeating Batch generator

The way the iterator is currently defined, it will keep providing batches forever.

Although it is not needed, we want to show you the itertools.cycle function which is really useful when you have a generator that eventually stops.

Usually we want to cycle over the dataset multiple times during training (i.e. train for multiple epochs).

For small datasets we can use itertools.cycle to achieve this easily.

infinite_data_generator = cycle(
    data_generator(batch_size=2, max_length=10, data_lines=tmp_lines))
ten_lines = [next(infinite_data_generator) for _ in range(10)]
print(len(ten_lines))
10

Bundle It Up

As always, since this is going to be needed further down the road, I'll bundle it up.

Imports

# python
import random

# pypi
import attr
import trax.fastmath.numpy as numpy

# this project
from neurotic.nlp.deep_rnn.data_loader import DataLoader

Data Generator

@attr.s(auto_attribs=True)
class DataGenerator:
    """Generates batches

    Args:
     data: lines of data
     data_loader: something with to-tensor method
     batch_size: size of the batches
     max_length: the maximum length for a line (longer lines will be ignored)
     shuffle: whether to shuffle the data
    """
    data: list
    data_loader: DataLoader
    batch_size: int
    max_length: int
    shuffle: bool=True
    _line_count: int= None
    _line_indices: list=None
    _generator: object=None

Line Count

@property
def line_count(self) -> int:
    """Number of lines in the data"""
    if self._line_count is None:
        self._line_count = len(self.data)
    return self._line_count

Line Indices

@property
def line_indices(self) -> list:
    """Indices of the lines in the data"""
    if self._line_indices is None:
        self._line_indices = list(range(self.line_count))
    return self._line_indices

The Iterator Method

def __iter__(self):
    """A pass-through for this method"""
    return self

The Batch Generator

def data_generator(self):
    """Generator method that yields batches of data

    Yields:
     (batch, batch, mask)
    """
    index = 0
    current_batch = []
    if self.shuffle:
        random.shuffle(self.line_indices)

    while True:
        if index >= self.line_count:
            index = 0
            if self.shuffle:
                random.shuffle(self._line_indices)

        line = self.data[self.line_indices[index]]
        if len(line) < self.max_length:
            current_batch.append(line)
        index += 1

        if len(current_batch) == self.batch_size:
            batch = []
            mask = []
            for line in current_batch:
                tensor = self.data_loader.to_tensor(line)
                tensor += [0] * (self.max_length - len(tensor))
                batch.append(tensor)
                mask.append([int(item != 0) for item in tensor])

            batch = numpy.array(batch)
            yield batch, batch, numpy.array(mask)
            current_batch = []
    return

The Generator

@property
def generator(self):
    """Infinite generator of batches"""
    if self._generator is None:
        self._generator = self.data_generator()
    return self._generator

The Next Method

def __next__(self):
    """make this an iterator"""
    return next(self.generator)

Try It Out

from neurotic.nlp.deep_rnn import DataGenerator, DataLoader

loader = DataLoader()
test_lines = ['12345678901',
              '123456789',
              '234567890',
              '345678901']

generator = DataGenerator(data=test_lines,
                          data_loader=loader,
                          batch_size=2,
                          max_length=10,
                          shuffle=False)

actual = next(generator)

expected = (numpy.array([[49, 50, 51, 52, 53, 54, 55, 56, 57,  1],
                         [50, 51, 52, 53, 54, 55, 56, 57, 48,  1]]),
            numpy.array([[49, 50, 51, 52, 53, 54, 55, 56, 57,  1],
                         [50, 51, 52, 53, 54, 55, 56, 57, 48,  1]]),
            numpy.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                         [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]))
for index, batch in enumerate(actual):
    try:
        expect(bool((batch==expected[index]).all())).to(be_true)
    except AssertionError:
        print(batch)
        print(expected[index])
        break

Deep N-Grams: Generating Sentences

Generating New Sentences

Now we'll use the language model to generate new sentences for that we need to make draws from a Gumble distribution.

The Gumbel Probability Density Function (PDF) is defined as: \[ f(z) = {1\over{\beta}}e^{\left(-z+e^{(-z)}\right)} \]

Where: \[ z = {(x - \mu)\over{\beta}} \]

The maximum value is what we choose as the prediction in the last step of a Recursive Neural Network RNN we are using for text generation. A sample of a random variable from an exponential distribution approaches the Gumbel distribution when the sample increases asymptotically. For that reason, the Gumbel distribution is used to sample from a categorical distribution.

Imports

# python
from pathlib import Path

# from pypi
import numpy

# this project
from neurotic.nlp.deep_rnn import GRUModel

Set Up

gru = GRUModel()
model = gru.model
ours = Path("~/models/gru-shakespeare-model/model.pkl.gz").expanduser()
model.init_from_file(ours)

Middle

The Gumbel Sample

def gumbel_sample(log_probabilities: numpy.array,
                  temperature: float=1.0) -> float:
    """Gumbel sampling from a categorical distribution

    Args:
     log_probabilities: model predictions for a given input
     temperature: fudge

    Returns:
     the maximum sample
    """
    u = numpy.random.uniform(low=1e-6, high=1.0 - 1e-6,
                             size=log_probabilities.shape)
    g = -numpy.log(-numpy.log(u))
    return numpy.argmax(log_probabilities + g * temperature, axis=-1)

A Predictor

END_OF_SENTENCE = 1

def predict(number_of_characters: int, prefix: str,
            break_on: int=END_OF_SENTENCE) -> str:
    """Predicts characters

    Args:
     number_of_characters: how many characters to predict
     prefix: character to prompt the predictions
     break_on: identifier for character to prematurely stop on

    Returns:
     prefix followed by predicted characters
    """
    inputs = [ord(character) for character in prefix]
    result = list(prefix)
    maximum_length = len(prefix) + number_of_characters
    for _ in range(number_of_characters):
        current_inputs = numpy.array(inputs + [0] * (maximum_length - len(inputs)))
        output = model(current_inputs[None, :])  # Add batch dim.
        next_character = gumbel_sample(output[0, len(inputs)])
        inputs += [int(next_character)]

        if inputs[-1] == break_on:
            break  # EOS
        result.append(chr(int(next_character)))

    return "".join(result)

Some Predictions

print(predict(32, ""))
you would not live at essenomed 

Yes, but I don't know anyone who would. Note that we are using a random sample, so repeatedly making predictions won't necessarily get you the same result.

print(predict(32, ""))
print(predict(32, ""))
print(predict(32, ""))
[exeunt]
katharine       yes, you are like the 
le beau where's some of my prett
print(predict(64, "falstaff"))
falstaff        yea, marry, lady, she hath bianced three months.

bianced?

print(predict(64, "beast"))
beastly, and god forbid, sir! our revenue's cannon,
start = "finger"
for word in range(5):
    start = predict(10, start)
    print(start)
finger, iago, an
finger, iago, and ask.
finger, iago, and ask.
finger, iago, and ask.
finger, iago, and ask.

So, if you feed it enough text, it becomes more deterministic.

SPACE = ord(" ")
start = "iago"
output = start
for word in range(10):
    tokens = predict(32, start).split()
    start = tokens[1] if len(tokens) > 1 else tokens[0]
    output = f"{output} {start}"
print(output)    
iago your husband if there never for you need no never

In the generated text above, you can see that the model generates text that makes sense capturing dependencies between words and without any input. A simple n-gram model would have not been able to capture all of that in one sentence.

On statistical methods

Using a statistical method will not give you results that are as good. The model would not be able to encode information seen previously in the data set and as a result, the perplexity will increase. The higher the perplexity, the worse your model is. Furthermore, statistical N-Gram models take up too much space and memory. As a result, it would be inefficient and too slow. Conversely, with deep neural networks, you can get a better perplexity. Note though, that learning about n-gram language models is still important and leads to a better understanding of deep neural networks.

Deep N-Grams: Evaluating the Model

Evaluating the Model

Now that you have learned how to train a model, you will learn how to evaluate it. To evaluate language models, we usually use perplexity which is a measure of how well a probability model predicts a sample. Note that perplexity is defined as:

\[ P(W) = \sqrt[N]{\prod_{i=1}^{N} \frac{1}{P(w_i| w_1,...,w_{n-1})}} \]

As an implementation hack, you would usually take the log of that formula (to enable us to use the log probabilities we get as output of our RNN, convert exponents to products, and products into sums which makes computations less complicated and computationally more efficient). You should also take care of the padding, since you do not want to include the padding when calculating the perplexity (because we do not want to have a perplexity measure that is artificially good).

\begin{align} log P(W) &= {log\left(\sqrt[N]{\prod_{i=1}^{N} \frac{1}{P(w_i| w_1,\ldots,w_{n-1})}}\right)} \\ &= {log\left({\prod_{i=1}^{N} \frac{1}{P(w_i| w_1,\ldots,w_{n-1})}}\right)^{\frac{1}{N}}}\\ & = {log\left({\prod_{i=1}^{N}{P(w_i| w_1,\ldots,w_{n-1})}}\right)^{-\frac{1}{N}}} \\ & = -\frac{1}{N}{log\left({\prod_{i=1}^{N}{P(w_i| w_1,\ldots,w_{n-1})}}\right)} \\ & = -\frac{1}{N}{\left({\sum_{i=1}^{N}{logP(w_i| w_1,\ldots,w_{n-1})}}\right)} \end{align}

Instructions: Write a program that will help evaluate your model. Implementation hack: your program takes in preds and target. Preds is a tensor of log probabilities. You can use tl.one_hot to transform the target into the same dimension. You then multiply them and sum.

You also have to create a mask to only get the non-padded probabilities. Good luck!

Hints

  • To convert the target into the same dimension as the predictions tensor use tl.one.hot with target and preds.shape[-1].
  • You will also need the np.equal function in order to unpad the data and properly compute perplexity.
  • Keep in mind while implementing the formula above that \(w_i\) represents a letter from our 256 letter alphabet.

Imports

# python
from collections import namedtuple
from pathlib import Path

import os

# pypi
from dotenv import load_dotenv
from trax import layers

import trax.fastmath.numpy as numpy
import jax
# this project
from neurotic.nlp.deep_rnn import DataLoader, DataGenerator, GRUModel

Set Up

load_dotenv("posts/nlp/.env")
DataSettings = namedtuple(
    "DataSettings",
    "batch_size max_length output".split())
SETTINGS = DataSettings(batch_size=32,
                        max_length=64,
                        output="~/models/gru-shakespeare-model/")
loader = DataLoader()
training_generator = DataGenerator(data=loader.training, data_loader=loader,
                                   batch_size=SETTINGS.batch_size,
                                   max_length=SETTINGS.max_length,
                                   shuffle=False)

Middle

def test_model(preds: jax.interpreters.xla.DeviceArray,
               target: jax.interpreters.xla.DeviceArray) -> float:
    """Function to test the model.

    Args:
       preds: Predictions of a list of batches of tensors corresponding to lines of text.
       target: Actual list of batches of tensors corresponding to lines of text.

    Returns:
       float: log_perplexity of the model.
    """
    total_log_ppx = numpy.sum(layers.one_hot(x=target, n_categories=preds.shape[-1]) * preds, axis= -1) # HINT: tl.one_hot() should replace one of the Nones

    non_pad = 1.0 - numpy.equal(target, 0)          # You should check if the target equals 0
    ppx = total_log_ppx * non_pad                             # Get rid of the padding

    log_ppx = numpy.sum(ppx) / numpy.sum(non_pad)

    return -log_ppx

Testing

Pre-Built Model

We're going to start with a pre-built file and see how it does relative to our model.

gru = GRUModel()
model = gru.model
pre_built = Path(os.environ["PRE_BUILT_MODEL"]).expanduser()
model.init_from_file(pre_built)
print(model)
Serial[
  Serial[
    ShiftRight(1)
  ]
  Embedding_256_512
  GRU_512
  GRU_512
  Dense_256
  LogSoftmax
]
batch = next(training_generator)
preds = model(batch[0])
log_ppx = test_model(preds, batch[1])
print('The log perplexity and perplexity of your model are respectively', log_ppx, numpy.exp(log_ppx))
The log perplexity and perplexity of your model are respectively 2.0370717 7.6681223

Our Model

gru = GRUModel()
model = gru.model
ours = Path("~/models/gru-shakespeare-model/model.pkl.gz").expanduser()
model.init_from_file(ours)
print(model)
Serial[
  Serial[
    ShiftRight(1)
  ]
  Embedding_256_512
  GRU_512
  GRU_512
  Dense_256
  LogSoftmax
]
batch = next(training_generator)
preds = model(batch[0])
log_ppx = test_model(preds, batch[1])
print('The log perplexity and perplexity of your model are respectively', log_ppx, numpy.exp(log_ppx))
The log perplexity and perplexity of your model are respectively 0.93021315 2.5350494

On the one hand I over-trained my model, on the other hand… why such a big difference?

Deep N-Grams: Training the Model

Training The Model

Now we are going to train the model. We have to define:

  • the cost function
  • the optimizer

To train a model on a task, Trax defines an abstraction called trax.supervised.training.TrainTask which packages the training data, loss, and optimizer (among other things) together into an object.

Similarly, to evaluate a model Trax defines an abstraction trax.supervised.training.EvalTask which packages the eval data and metrics (among other things) into another object (and which doesn't seem to have any documentation yet).

The final piece tying things together is the trax.supervised.training.Loop abstraction that is a very simple and flexible way to put everything together and train the model, all the while evaluating it and saving checkpoints.

Using training.Loop will save you a lot of code compared to always writing the training loop by hand, like you did in courses 1 and 2. More importantly, you are less likely to have a bug in that code that would ruin your training.

Imports

# python
from collections import namedtuple
from datetime import datetime
from functools import partial

# pypi
from expects import equal, expect
from holoviews import opts
from trax.supervised import training as trax_training
from trax import layers

import holoviews
import hvplot.pandas
import pandas
import trax

# this project
from neurotic.nlp.deep_rnn import GRUModel, DataGenerator, DataLoader

# another project
from graeae import EmbedHoloviews, Timer

Set Up

Some Constants

DataSettings = namedtuple(
    "DataSettings",
    "batch_size max_length learning_rate output".split())
SETTINGS = DataSettings(batch_size=32,
                        max_length=64,
                        learning_rate=0.0005,
                        output="~/models/gru-shakespeare-model/")

Previous Code From this Series

loader = DataLoader()

# the name "training" was getting confusing (since trax's module is also called
# training) so this is training_generator and their's is trax_training
training_generator = DataGenerator(data=loader.training, data_loader=loader,
                  batch_size=SETTINGS.batch_size,
                  max_length=SETTINGS.max_length)

evaluation = DataGenerator(data=loader.validation, data_loader=loader,
                  batch_size=SETTINGS.batch_size,
                  max_length=SETTINGS.max_length)
gru = GRUModel()

Plotting

slug = "deep-n-grams-training-the-model"
Embed = partial(EmbedHoloviews, folder_path=f"files/posts/nlp/{slug}")

Plot = namedtuple("Plot", ["width", "height", "fontscale", "tan", "blue", "red"])
PLOT = Plot(
    width=900,
    height=750,
    fontscale=2,
    tan="#ddb377",
    blue="#4687b7",
    red="#ce7b6d",
 )

Middle

Some Jargon

An epoch is traditionally defined as one pass through the dataset.

Since the dataset was divided into batches you need several steps (gradient evaluations) in order to complete an epoch. So, one epoch corresponds to the number of examples in a batch times the number of steps. In short, in each epoch you go over all of the data.

The max_length variable defines the maximum length of lines to be used in training our data, lines longer that that length are discarded.

Below is a function and results that indicate how many lines conform to our criteria of maximum length of a sentence in the entire dataset and how many steps are required in order to cover the entire dataset which in turn corresponds to an epoch.

def lines_used(lines: list, max_length: int) -> int:
    """Counts the number of lines of max_length or shorter

    Args: 
     lines: all lines of text as an array of lines
     max_length: maximum length of a line to use

    Returns:
     number of usable examples
    """
    return sum(1 for line in lines if len(line) <= max_length)

Let's see what we get.

useable = lines_used(loader.training, 32)
print(f"Number of used lines from the dataset: {useable:,}")
print(f"Batch size (a power of 2): {SETTINGS.batch_size}")
steps_per_epoch = int(useable/SETTINGS.batch_size)
print(f"Number of steps to cover one epoch: {steps_per_epoch}")

# our training sets aren't exactly the same for some reason.
# expect(useable).to(equal(25881))
# expect(steps_per_epoch).to(equal(808))
Number of used lines from the dataset: 25,781
Batch size (a power of 2): 32
Number of steps to cover one epoch: 805

It looks like the original notebook used os.listdir while I'm using Path.glob. Neither of them load the files in alphabetical order, but they also don't load them in the same order as each other for some reason, so our data sets are the same length but the training and validation split created slightly different sets. Oh, well.

Training the Model

We'll implement the train_model program below to train the neural network we created in the previous post. Here is a list of things to do:

  • Create a trax.supervised.trainer.TrainTask object:
  • Create a trax.supervised.trainer.EvalTask object:
    • labeled_data = the labeled data that we want to evaluate on.
    • metrics = CrossEntropyLoss() and Accuracy()
    • How frequently we want to evaluate and checkpoint the model.
  • Create a trax.supervised.trainer.Loop object, this encapsulates the following:
    • The previously created TrainTask and EvalTask objects.
    • the training model
    • optionally the evaluation model, if different from the training model. NOTE: in presence of Dropout, etc. we usually want the evaluation model to behave slightly differently than the training model.

We will be using a cross entropy loss, with the Adam optimizer. See the trax documentation to get a better understanding. Make sure you use the number of steps provided as a parameter to train for the desired number of steps.

NOTE: Don't forget to wrap the data generator in itertools.cycle to iterate on it for multiple epochs.

def train_model(model: layers.Serial, data_generator: DataGenerator,
                batch_size: int=SETTINGS.batch_size,
                max_length: int=SETTINGS.max_length,
                lines: list=loader.training,
                eval_lines: list=loader.validation,
                n_steps: int=1, output_dir='model/') -> training.Loop: 
    """Function that trains the model

    Args:
      model: GRU model.
      data_generator: Data generator function.
      batch_size: Number of lines per batch.
      max_length: Maximum length allowed for a line to be processed. 
      lines: List of lines to use for training. Defaults to lines.
      eval_lines: List of lines to use for evaluation.
      n_steps: Number of steps to train.
      output_dir: Relative path of directory to save model.

    Returns:
      Training loop for the model.
    """
    # this is the broken version for submission, I'll make a separate one for local running.

    bare_train_generator = data_generator(batch_size, max_length, lines,
     line_to_tensor)
    infinite_train_generator = itertools.cycle(bare_train_generator)

    bare_eval_generator = data_generator(batch_size, max_length,
                                         eval_lines,
                                         line_to_tensor)

    infinite_eval_generator = itertools.cycle(bare_eval_generator)

    # the notebook code is out of date so we need to have one for them and one for us... damnit
    # this first one is theirs
    train_task = training.TrainTask(
        labeled_data=infinite_train_generator,
        loss_layer=tl.CrossEntropyLoss(),   # Don't forget to instantiate this object
        optimizer=trax.optimizers.Adam(learning_rate=0.0005)     # Don't forget to add the learning rate parameter
    )

    eval_task = training.EvalTask(
        labeled_data=infinite_eval_generator,
        metrics=[tl.CrossEntropyLoss(), tl.Accuracy()], # Don't forget to instantiate these objects
        n_eval_batches=3      # For better evaluation accuracy in reasonable time
    )

    training_loop = training.Loop(model,
                                  train_task,
                                  eval_task=eval_task,
                                  output_dir=output_dir)

    training_loop.run(n_steps=n_steps)


    # We return this because it contains a handle to the model, which has the weights etc.
    return training_loop
training_loop = train_model(GRULM(), data_generator)

The model was only trained for 1 step due to the constraints of this environment. Even on a GPU accelerated environment it will take many hours for it to achieve a good level of accuracy. For the rest of the assignment you will be using a pretrained model but now you should understand how the training can be done using Trax.

Take Two

def take_two(model: layers.Serial,
             training: DataGenerator,
             evaluation: DataGenerator,
             learning_rate: float=SETTINGS.learning_rate,
             batches: int=1,
             evaluation_batches: int=3,
             steps_per_checkpoint: int=1000,
             output_dir=SETTINGS.output) -> trax_training.Loop: 
    """Function that trains the model

    Args:
      model: GRU model.
      training: cycling data generator for training
      evaluation: cycling data generator for evaluation
      learning_rate: alpha for the optimizer
      batches: Number of batches to train.
      evaluation_batches: number of evaluation batches to run
      steps_per_checkpoint: how often to stop and evaluate the model
      output_dir: Relative path of directory to save model.

    Returns:
      Training loop for the model.
    """
    train_task = trax_training.TrainTask(
        labeled_data=training,
        loss_layer=layers.WeightedCategoryCrossEntropy(),
        optimizer=trax.optimizers.Adam(learning_rate=learning_rate),
        n_steps_per_checkpoint=steps_per_checkpoint
    )

    eval_task = trax_training.EvalTask(
        labeled_data=evaluation,
        metrics=[layers.WeightedCategoryCrossEntropy(),
                 layers.Accuracy()],
        n_eval_batches=evaluation_batches
    )

    training_loop = trax_training.Loop(model,
                                  train_task,
                                  eval_tasks=[eval_task],
                                  output_dir=output_dir)
    start = datetime.now()
    training_loop.run(n_steps=batches)
    print(f"Elapsed: {datetime.now() - start}")
    return training_loop
loop = take_two(gru.model, training_generator, evaluation, batches=1000)

Step      1: Total number of trainable weights: 3411200
Step      1: Ran 1 train steps in 2.64 secs
Step      1: train WeightedCategoryCrossEntropy |  5.54519987
Step      1: eval  WeightedCategoryCrossEntropy |  5.54099703
Step      1: eval                      Accuracy |  0.15382584

Step   1000: Ran 999 train steps in 38.68 secs
Step   1000: train WeightedCategoryCrossEntropy |  2.28923297
Step   1000: eval  WeightedCategoryCrossEntropy |  1.82684219
Step   1000: eval                      Accuracy |  0.45511819
Elapsed: 0:00:41.796167

Now let's see what the history tells us.

Note: As of January 9, 2021 the version of trax on pypi (1.3.7) doesn't have a History object (and it isn't documented) so to use this I had to install trax from the master branch of the GitHub Repsitory.

print(loop.history.modes)
print(f"Evaluation metrics: {loop.history.metrics_for_mode('eval')}")
print(f"Training Metrics: {loop.history.metrics_for_mode('train')}")

print(f"Evaluation Accuracy: {loop.history.get('eval', 'metrics/Accuracy')}")
['eval', 'train']
Evaluation metrics: ['metrics/Accuracy', 'metrics/WeightedCategoryCrossEntropy']
Training Metrics: ['metrics/WeightedCategoryCrossEntropy', 'training/gradients_l2', 'training/learning_rate', 'training/loss', 'training/steps per second', 'training/weights_l2']
Evaluation Accuracy: [(1, 0.15382583936055502), (1000, 0.45511818925539654)]

It made a pretty remarkable improvement after a thousand batches, especially considering it only took forty-seconds or so. Let's up the number of batches.

loop = take_two(gru.model, training_generator, evaluation, batches=1000)

Step   2000: Ran 1000 train steps in 39.75 secs
Step   2000: train WeightedCategoryCrossEntropy |  1.66551745
Step   2000: eval  WeightedCategoryCrossEntropy |  1.65215000
Step   2000: eval                      Accuracy |  0.49342343
Elapsed: 0:00:40.189560

Well, I forgot to up the number of batches. This time though…

loop = take_two(gru.model, training_generator, evaluation, batches=10000)

Step   3000: Ran 1000 train steps in 39.81 secs
Step   3000: train WeightedCategoryCrossEntropy |  1.49474919
Step   3000: eval  WeightedCategoryCrossEntropy |  1.50722202
Step   3000: eval                      Accuracy |  0.53727521

Step   4000: Ran 1000 train steps in 38.82 secs
Step   4000: train WeightedCategoryCrossEntropy |  1.40773308
Step   4000: eval  WeightedCategoryCrossEntropy |  1.44813490
Step   4000: eval                      Accuracy |  0.54536728

Step   5000: Ran 1000 train steps in 38.90 secs
Step   5000: train WeightedCategoryCrossEntropy |  1.35936761
Step   5000: eval  WeightedCategoryCrossEntropy |  1.40560397
Step   5000: eval                      Accuracy |  0.55885768

Step   6000: Ran 1000 train steps in 38.88 secs
Step   6000: train WeightedCategoryCrossEntropy |  1.33801484
Step   6000: eval  WeightedCategoryCrossEntropy |  1.36113369
Step   6000: eval                      Accuracy |  0.57642752

Step   7000: Ran 1000 train steps in 38.86 secs
Step   7000: train WeightedCategoryCrossEntropy |  1.32240558
Step   7000: eval  WeightedCategoryCrossEntropy |  1.38307476
Step   7000: eval                      Accuracy |  0.56590829

Step   8000: Ran 1000 train steps in 38.90 secs
Step   8000: train WeightedCategoryCrossEntropy |  1.30228114
Step   8000: eval  WeightedCategoryCrossEntropy |  1.38889817
Step   8000: eval                      Accuracy |  0.56193008

Step   9000: Ran 1000 train steps in 38.88 secs
Step   9000: train WeightedCategoryCrossEntropy |  1.28101051
Step   9000: eval  WeightedCategoryCrossEntropy |  1.36015956
Step   9000: eval                      Accuracy |  0.56561601

Step  10000: Ran 1000 train steps in 38.86 secs
Step  10000: train WeightedCategoryCrossEntropy |  1.27505744
Step  10000: eval  WeightedCategoryCrossEntropy |  1.36137756
Step  10000: eval                      Accuracy |  0.57053447

Step  11000: Ran 1000 train steps in 38.85 secs
Step  11000: train WeightedCategoryCrossEntropy |  1.27052534
Step  11000: eval  WeightedCategoryCrossEntropy |  1.34181790
Step  11000: eval                      Accuracy |  0.57359161

Step  12000: Ran 1000 train steps in 38.85 secs
Step  12000: train WeightedCategoryCrossEntropy |  1.25399101
Step  12000: eval  WeightedCategoryCrossEntropy |  1.34485857
Step  12000: eval                      Accuracy |  0.57139154
Elapsed: 0:06:30.471829

It seems to be plateauing.

loop = take_two(gru.model, training_generator, evaluation, batches=50000)

Step  13000: Ran 1000 train steps in 39.74 secs
Step  13000: train WeightedCategoryCrossEntropy |  1.28382349
Step  13000: eval  WeightedCategoryCrossEntropy |  1.34152850
Step  13000: eval                      Accuracy |  0.56759004

Step  14000: Ran 1000 train steps in 38.70 secs
Step  14000: train WeightedCategoryCrossEntropy |  1.24999321
Step  14000: eval  WeightedCategoryCrossEntropy |  1.31848574
Step  14000: eval                      Accuracy |  0.58393063

Step  15000: Ran 1000 train steps in 38.64 secs
Step  15000: train WeightedCategoryCrossEntropy |  1.23975933
Step  15000: eval  WeightedCategoryCrossEntropy |  1.31624317
Step  15000: eval                      Accuracy |  0.58447830

Step  16000: Ran 1000 train steps in 38.64 secs
Step  16000: train WeightedCategoryCrossEntropy |  1.21947169
Step  16000: eval  WeightedCategoryCrossEntropy |  1.28875721
Step  16000: eval                      Accuracy |  0.57887546

Step  17000: Ran 1000 train steps in 38.62 secs
Step  17000: train WeightedCategoryCrossEntropy |  1.21219873
Step  17000: eval  WeightedCategoryCrossEntropy |  1.33571080
Step  17000: eval                      Accuracy |  0.57712994

Step  18000: Ran 1000 train steps in 38.66 secs
Step  18000: train WeightedCategoryCrossEntropy |  1.21026635
Step  18000: eval  WeightedCategoryCrossEntropy |  1.32456430
Step  18000: eval                      Accuracy |  0.58517017

Step  19000: Ran 1000 train steps in 38.64 secs
Step  19000: train WeightedCategoryCrossEntropy |  1.21169627
Step  19000: eval  WeightedCategoryCrossEntropy |  1.32556013
Step  19000: eval                      Accuracy |  0.58419540

Step  20000: Ran 1000 train steps in 38.71 secs
Step  20000: train WeightedCategoryCrossEntropy |  1.18635964
Step  20000: eval  WeightedCategoryCrossEntropy |  1.29579870
Step  20000: eval                      Accuracy |  0.58305796

Step  21000: Ran 1000 train steps in 38.64 secs
Step  21000: train WeightedCategoryCrossEntropy |  1.18904626
Step  21000: eval  WeightedCategoryCrossEntropy |  1.30543160
Step  21000: eval                      Accuracy |  0.58511112

Step  22000: Ran 1000 train steps in 38.66 secs
Step  22000: train WeightedCategoryCrossEntropy |  1.19396818
Step  22000: eval  WeightedCategoryCrossEntropy |  1.29183892
Step  22000: eval                      Accuracy |  0.58100422

Step  23000: Ran 1000 train steps in 38.71 secs
Step  23000: train WeightedCategoryCrossEntropy |  1.19577324
Step  23000: eval  WeightedCategoryCrossEntropy |  1.31765648
Step  23000: eval                      Accuracy |  0.57812850

Step  24000: Ran 1000 train steps in 38.77 secs
Step  24000: train WeightedCategoryCrossEntropy |  1.16455758
Step  24000: eval  WeightedCategoryCrossEntropy |  1.30760705
Step  24000: eval                      Accuracy |  0.58308929

Step  25000: Ran 1000 train steps in 38.68 secs
Step  25000: train WeightedCategoryCrossEntropy |  1.17373812
Step  25000: eval  WeightedCategoryCrossEntropy |  1.33733491
Step  25000: eval                      Accuracy |  0.58254947

Step  26000: Ran 1000 train steps in 38.73 secs
Step  26000: train WeightedCategoryCrossEntropy |  1.17703664
Step  26000: eval  WeightedCategoryCrossEntropy |  1.30382776
Step  26000: eval                      Accuracy |  0.59271948

Step  27000: Ran 1000 train steps in 38.77 secs
Step  27000: train WeightedCategoryCrossEntropy |  1.17249799
Step  27000: eval  WeightedCategoryCrossEntropy |  1.29767748
Step  27000: eval                      Accuracy |  0.59217713

Step  28000: Ran 1000 train steps in 38.70 secs
Step  28000: train WeightedCategoryCrossEntropy |  1.15188992
Step  28000: eval  WeightedCategoryCrossEntropy |  1.27955910
Step  28000: eval                      Accuracy |  0.60145231

Step  29000: Ran 1000 train steps in 38.71 secs
Step  29000: train WeightedCategoryCrossEntropy |  1.15883470
Step  29000: eval  WeightedCategoryCrossEntropy |  1.32158053
Step  29000: eval                      Accuracy |  0.58393308

Step  30000: Ran 1000 train steps in 38.69 secs
Step  30000: train WeightedCategoryCrossEntropy |  1.16402268
Step  30000: eval  WeightedCategoryCrossEntropy |  1.28583026
Step  30000: eval                      Accuracy |  0.59060840

Step  31000: Ran 1000 train steps in 38.76 secs
Step  31000: train WeightedCategoryCrossEntropy |  1.15244710
Step  31000: eval  WeightedCategoryCrossEntropy |  1.31478047
Step  31000: eval                      Accuracy |  0.58421228

Step  32000: Ran 1000 train steps in 38.74 secs
Step  32000: train WeightedCategoryCrossEntropy |  1.13865745
Step  32000: eval  WeightedCategoryCrossEntropy |  1.30897808
Step  32000: eval                      Accuracy |  0.58211388

Step  33000: Ran 1000 train steps in 38.70 secs
Step  33000: train WeightedCategoryCrossEntropy |  1.14797425
Step  33000: eval  WeightedCategoryCrossEntropy |  1.28837899
Step  33000: eval                      Accuracy |  0.59355628

Step  34000: Ran 1000 train steps in 38.71 secs
Step  34000: train WeightedCategoryCrossEntropy |  1.15177202
Step  34000: eval  WeightedCategoryCrossEntropy |  1.26875858
Step  34000: eval                      Accuracy |  0.59396426

Step  35000: Ran 1000 train steps in 38.74 secs
Step  35000: train WeightedCategoryCrossEntropy |  1.13462234
Step  35000: eval  WeightedCategoryCrossEntropy |  1.33155421
Step  35000: eval                      Accuracy |  0.58831197

Step  36000: Ran 1000 train steps in 38.76 secs
Step  36000: train WeightedCategoryCrossEntropy |  1.12743652
Step  36000: eval  WeightedCategoryCrossEntropy |  1.31895538
Step  36000: eval                      Accuracy |  0.57935937

Step  37000: Ran 1000 train steps in 38.76 secs
Step  37000: train WeightedCategoryCrossEntropy |  1.13511860
Step  37000: eval  WeightedCategoryCrossEntropy |  1.34238366
Step  37000: eval                      Accuracy |  0.58156353

Step  38000: Ran 1000 train steps in 38.72 secs
Step  38000: train WeightedCategoryCrossEntropy |  1.14187491
Step  38000: eval  WeightedCategoryCrossEntropy |  1.30659600
Step  38000: eval                      Accuracy |  0.58288614

Step  39000: Ran 1000 train steps in 38.76 secs
Step  39000: train WeightedCategoryCrossEntropy |  1.12084019
Step  39000: eval  WeightedCategoryCrossEntropy |  1.28768833
Step  39000: eval                      Accuracy |  0.60021923

Step  40000: Ran 1000 train steps in 38.71 secs
Step  40000: train WeightedCategoryCrossEntropy |  1.11764979
Step  40000: eval  WeightedCategoryCrossEntropy |  1.33905506
Step  40000: eval                      Accuracy |  0.57679999

Step  41000: Ran 1000 train steps in 38.74 secs
Step  41000: train WeightedCategoryCrossEntropy |  1.12686217
Step  41000: eval  WeightedCategoryCrossEntropy |  1.32088705
Step  41000: eval                      Accuracy |  0.58238810

Step  42000: Ran 1000 train steps in 38.75 secs
Step  42000: train WeightedCategoryCrossEntropy |  1.13109481
Step  42000: eval  WeightedCategoryCrossEntropy |  1.31838973
Step  42000: eval                      Accuracy |  0.58213743

Step  43000: Ran 1000 train steps in 38.79 secs
Step  43000: train WeightedCategoryCrossEntropy |  1.10290754
Step  43000: eval  WeightedCategoryCrossEntropy |  1.31488041
Step  43000: eval                      Accuracy |  0.59099247

Step  44000: Ran 1000 train steps in 38.75 secs
Step  44000: train WeightedCategoryCrossEntropy |  1.11154807
Step  44000: eval  WeightedCategoryCrossEntropy |  1.32115630
Step  44000: eval                      Accuracy |  0.58481665

Step  45000: Ran 1000 train steps in 38.74 secs
Step  45000: train WeightedCategoryCrossEntropy |  1.11626506
Step  45000: eval  WeightedCategoryCrossEntropy |  1.32583074
Step  45000: eval                      Accuracy |  0.58425963

Step  46000: Ran 1000 train steps in 38.75 secs
Step  46000: train WeightedCategoryCrossEntropy |  1.12253380
Step  46000: eval  WeightedCategoryCrossEntropy |  1.28128795
Step  46000: eval                      Accuracy |  0.59816724

Step  47000: Ran 1000 train steps in 38.78 secs
Step  47000: train WeightedCategoryCrossEntropy |  1.08949089
Step  47000: eval  WeightedCategoryCrossEntropy |  1.31317608
Step  47000: eval                      Accuracy |  0.58273973

Step  48000: Ran 1000 train steps in 38.75 secs
Step  48000: train WeightedCategoryCrossEntropy |  1.10382092
Step  48000: eval  WeightedCategoryCrossEntropy |  1.35037680
Step  48000: eval                      Accuracy |  0.58653913

Step  49000: Ran 1000 train steps in 38.74 secs
Step  49000: train WeightedCategoryCrossEntropy |  1.10920715
Step  49000: eval  WeightedCategoryCrossEntropy |  1.34068878
Step  49000: eval                      Accuracy |  0.57137036

Step  50000: Ran 1000 train steps in 38.78 secs
Step  50000: train WeightedCategoryCrossEntropy |  1.10644996
Step  50000: eval  WeightedCategoryCrossEntropy |  1.32040668
Step  50000: eval                      Accuracy |  0.58469077

Step  51000: Ran 1000 train steps in 38.73 secs
Step  51000: train WeightedCategoryCrossEntropy |  1.08133543
Step  51000: eval  WeightedCategoryCrossEntropy |  1.31978738
Step  51000: eval                      Accuracy |  0.58491902

Step  52000: Ran 1000 train steps in 38.73 secs
Step  52000: train WeightedCategoryCrossEntropy |  1.09691930
Step  52000: eval  WeightedCategoryCrossEntropy |  1.32925705
Step  52000: eval                      Accuracy |  0.58861417

Step  53000: Ran 1000 train steps in 38.68 secs
Step  53000: train WeightedCategoryCrossEntropy |  1.10452163
Step  53000: eval  WeightedCategoryCrossEntropy |  1.29868329
Step  53000: eval                      Accuracy |  0.60251764

Step  54000: Ran 1000 train steps in 38.74 secs
Step  54000: train WeightedCategoryCrossEntropy |  1.09207809
Step  54000: eval  WeightedCategoryCrossEntropy |  1.35772077
Step  54000: eval                      Accuracy |  0.57129671

Step  55000: Ran 1000 train steps in 38.72 secs
Step  55000: train WeightedCategoryCrossEntropy |  1.07641542
Step  55000: eval  WeightedCategoryCrossEntropy |  1.36485183
Step  55000: eval                      Accuracy |  0.58672802

Step  56000: Ran 1000 train steps in 38.72 secs
Step  56000: train WeightedCategoryCrossEntropy |  1.08802187
Step  56000: eval  WeightedCategoryCrossEntropy |  1.30784667
Step  56000: eval                      Accuracy |  0.59716912

Step  57000: Ran 1000 train steps in 38.71 secs
Step  57000: train WeightedCategoryCrossEntropy |  1.09764445
Step  57000: eval  WeightedCategoryCrossEntropy |  1.35429418
Step  57000: eval                      Accuracy |  0.57975992

Step  58000: Ran 1000 train steps in 38.74 secs
Step  58000: train WeightedCategoryCrossEntropy |  1.07809854
Step  58000: eval  WeightedCategoryCrossEntropy |  1.32458742
Step  58000: eval                      Accuracy |  0.57735123

Step  59000: Ran 1000 train steps in 38.72 secs
Step  59000: train WeightedCategoryCrossEntropy |  1.07255101
Step  59000: eval  WeightedCategoryCrossEntropy |  1.28845433
Step  59000: eval                      Accuracy |  0.59338196

Step  60000: Ran 1000 train steps in 38.73 secs
Step  60000: train WeightedCategoryCrossEntropy |  1.08358848
Step  60000: eval  WeightedCategoryCrossEntropy |  1.31605566
Step  60000: eval                      Accuracy |  0.58012034

Step  61000: Ran 1000 train steps in 38.70 secs
Step  61000: train WeightedCategoryCrossEntropy |  1.08817053
Step  61000: eval  WeightedCategoryCrossEntropy |  1.32721674
Step  61000: eval                      Accuracy |  0.58768902

Step  62000: Ran 1000 train steps in 38.73 secs
Step  62000: train WeightedCategoryCrossEntropy |  1.06626439
Step  62000: eval  WeightedCategoryCrossEntropy |  1.33657344
Step  62000: eval                      Accuracy |  0.58727795
Elapsed: 0:32:19.629778
loop = take_two(gru.model, training_generator, evaluation, batches=100000)

Step  63000: Ran 1000 train steps in 39.93 secs
Step  63000: train WeightedCategoryCrossEntropy |  1.16796327
Step  63000: eval  WeightedCategoryCrossEntropy |  1.36395303
Step  63000: eval                      Accuracy |  0.57032533

Step  64000: Ran 1000 train steps in 38.89 secs
Step  64000: train WeightedCategoryCrossEntropy |  1.11666918
Step  64000: eval  WeightedCategoryCrossEntropy |  1.32780838
Step  64000: eval                      Accuracy |  0.57505075

Step  65000: Ran 1000 train steps in 38.90 secs
Step  65000: train WeightedCategoryCrossEntropy |  1.10621011
Step  65000: eval  WeightedCategoryCrossEntropy |  1.33678579
Step  65000: eval                      Accuracy |  0.57886046

Step  66000: Ran 1000 train steps in 38.93 secs
Step  66000: train WeightedCategoryCrossEntropy |  1.06902885
Step  66000: eval  WeightedCategoryCrossEntropy |  1.33837553
Step  66000: eval                      Accuracy |  0.58116663

Step  67000: Ran 1000 train steps in 38.86 secs
Step  67000: train WeightedCategoryCrossEntropy |  1.07529819
Step  67000: eval  WeightedCategoryCrossEntropy |  1.34368738
Step  67000: eval                      Accuracy |  0.58368655

Step  68000: Ran 1000 train steps in 38.88 secs
Step  68000: train WeightedCategoryCrossEntropy |  1.08158481
Step  68000: eval  WeightedCategoryCrossEntropy |  1.31722498
Step  68000: eval                      Accuracy |  0.58705380

Step  69000: Ran 1000 train steps in 38.95 secs
Step  69000: train WeightedCategoryCrossEntropy |  1.08769965
Step  69000: eval  WeightedCategoryCrossEntropy |  1.31406136
Step  69000: eval                      Accuracy |  0.58490791

Step  70000: Ran 1000 train steps in 38.88 secs
Step  70000: train WeightedCategoryCrossEntropy |  1.04882610
Step  70000: eval  WeightedCategoryCrossEntropy |  1.38410521
Step  70000: eval                      Accuracy |  0.56796430

Step  71000: Ran 1000 train steps in 38.90 secs
Step  71000: train WeightedCategoryCrossEntropy |  1.06316447
Step  71000: eval  WeightedCategoryCrossEntropy |  1.30895372
Step  71000: eval                      Accuracy |  0.58984526

Step  72000: Ran 1000 train steps in 38.91 secs
Step  72000: train WeightedCategoryCrossEntropy |  1.07383156
Step  72000: eval  WeightedCategoryCrossEntropy |  1.38230101
Step  72000: eval                      Accuracy |  0.56828884

Step  73000: Ran 1000 train steps in 38.94 secs
Step  73000: train WeightedCategoryCrossEntropy |  1.07366288
Step  73000: eval  WeightedCategoryCrossEntropy |  1.29979046
Step  73000: eval                      Accuracy |  0.59334222

Step  74000: Ran 1000 train steps in 38.89 secs
Step  74000: train WeightedCategoryCrossEntropy |  1.04150283
Step  74000: eval  WeightedCategoryCrossEntropy |  1.39114801
Step  74000: eval                      Accuracy |  0.56706931

Step  75000: Ran 1000 train steps in 38.89 secs
Step  75000: train WeightedCategoryCrossEntropy |  1.06011724
Step  75000: eval  WeightedCategoryCrossEntropy |  1.31870242
Step  75000: eval                      Accuracy |  0.58975877

Step  76000: Ran 1000 train steps in 38.93 secs
Step  76000: train WeightedCategoryCrossEntropy |  1.06862414
Step  76000: eval  WeightedCategoryCrossEntropy |  1.33027065
Step  76000: eval                      Accuracy |  0.58500228

Step  77000: Ran 1000 train steps in 38.92 secs
Step  77000: train WeightedCategoryCrossEntropy |  1.05721939
Step  77000: eval  WeightedCategoryCrossEntropy |  1.36938119
Step  77000: eval                      Accuracy |  0.57774687

Step  78000: Ran 1000 train steps in 38.86 secs
Step  78000: train WeightedCategoryCrossEntropy |  1.04032123
Step  78000: eval  WeightedCategoryCrossEntropy |  1.35787050
Step  78000: eval                      Accuracy |  0.58307936

Step  79000: Ran 1000 train steps in 38.89 secs
Step  79000: train WeightedCategoryCrossEntropy |  1.05514109
Step  79000: eval  WeightedCategoryCrossEntropy |  1.34510783
Step  79000: eval                      Accuracy |  0.59036636

Step  80000: Ran 1000 train steps in 38.91 secs
Step  80000: train WeightedCategoryCrossEntropy |  1.06119215
Step  80000: eval  WeightedCategoryCrossEntropy |  1.35925500
Step  80000: eval                      Accuracy |  0.58475639

Step  81000: Ran 1000 train steps in 38.93 secs
Step  81000: train WeightedCategoryCrossEntropy |  1.04676783
Step  81000: eval  WeightedCategoryCrossEntropy |  1.36667589
Step  81000: eval                      Accuracy |  0.57690132

Step  82000: Ran 1000 train steps in 38.88 secs
Step  82000: train WeightedCategoryCrossEntropy |  1.03751075
Step  82000: eval  WeightedCategoryCrossEntropy |  1.34715915
Step  82000: eval                      Accuracy |  0.58315720

Step  83000: Ran 1000 train steps in 38.88 secs
Step  83000: train WeightedCategoryCrossEntropy |  1.05128062
Step  83000: eval  WeightedCategoryCrossEntropy |  1.39356836
Step  83000: eval                      Accuracy |  0.57512679

Step  84000: Ran 1000 train steps in 38.89 secs
Step  84000: train WeightedCategoryCrossEntropy |  1.05902994
Step  84000: eval  WeightedCategoryCrossEntropy |  1.33182939
Step  84000: eval                      Accuracy |  0.57415217

Step  85000: Ran 1000 train steps in 38.93 secs
Step  85000: train WeightedCategoryCrossEntropy |  1.03327870
Step  85000: eval  WeightedCategoryCrossEntropy |  1.35110184
Step  85000: eval                      Accuracy |  0.57771309

Step  86000: Ran 1000 train steps in 38.81 secs
Step  86000: train WeightedCategoryCrossEntropy |  1.03494859
Step  86000: eval  WeightedCategoryCrossEntropy |  1.38251416
Step  86000: eval                      Accuracy |  0.57844079

Step  87000: Ran 1000 train steps in 38.95 secs
Step  87000: train WeightedCategoryCrossEntropy |  1.04720616
Step  87000: eval  WeightedCategoryCrossEntropy |  1.39008860
Step  87000: eval                      Accuracy |  0.57346765

Step  88000: Ran 1000 train steps in 38.92 secs
Step  88000: train WeightedCategoryCrossEntropy |  1.05683839
Step  88000: eval  WeightedCategoryCrossEntropy |  1.34061221
Step  88000: eval                      Accuracy |  0.57800055

Step  89000: Ran 1000 train steps in 38.96 secs
Step  89000: train WeightedCategoryCrossEntropy |  1.02072740
Step  89000: eval  WeightedCategoryCrossEntropy |  1.36288555
Step  89000: eval                      Accuracy |  0.57487903

Step  90000: Ran 1000 train steps in 38.94 secs
Step  90000: train WeightedCategoryCrossEntropy |  1.03256643
Step  90000: eval  WeightedCategoryCrossEntropy |  1.33989787
Step  90000: eval                      Accuracy |  0.58749672

Step  91000: Ran 1000 train steps in 38.90 secs
Step  91000: train WeightedCategoryCrossEntropy |  1.04493618
Step  91000: eval  WeightedCategoryCrossEntropy |  1.33348036
Step  91000: eval                      Accuracy |  0.58970133

Step  92000: Ran 1000 train steps in 38.88 secs
Step  92000: train WeightedCategoryCrossEntropy |  1.05325651
Step  92000: eval  WeightedCategoryCrossEntropy |  1.37317479
Step  92000: eval                      Accuracy |  0.57510771

Step  93000: Ran 1000 train steps in 38.92 secs
Step  93000: train WeightedCategoryCrossEntropy |  1.01199973
Step  93000: eval  WeightedCategoryCrossEntropy |  1.34816321
Step  93000: eval                      Accuracy |  0.58193330

Step  94000: Ran 1000 train steps in 38.84 secs
Step  94000: train WeightedCategoryCrossEntropy |  1.03259039
Step  94000: eval  WeightedCategoryCrossEntropy |  1.40019397
Step  94000: eval                      Accuracy |  0.57431702

Step  95000: Ran 1000 train steps in 38.88 secs
Step  95000: train WeightedCategoryCrossEntropy |  1.04201376
Step  95000: eval  WeightedCategoryCrossEntropy |  1.39143252
Step  95000: eval                      Accuracy |  0.57650570

Step  96000: Ran 1000 train steps in 39.04 secs
Step  96000: train WeightedCategoryCrossEntropy |  1.04046071
Step  96000: eval  WeightedCategoryCrossEntropy |  1.39077107
Step  96000: eval                      Accuracy |  0.56915913

Step  97000: Ran 1000 train steps in 38.87 secs
Step  97000: train WeightedCategoryCrossEntropy |  1.01071739
Step  97000: eval  WeightedCategoryCrossEntropy |  1.36615340
Step  97000: eval                      Accuracy |  0.58579030

Step  98000: Ran 1000 train steps in 38.88 secs
Step  98000: train WeightedCategoryCrossEntropy |  1.02754629
Step  98000: eval  WeightedCategoryCrossEntropy |  1.37784847
Step  98000: eval                      Accuracy |  0.56786172

Step  99000: Ran 1000 train steps in 38.86 secs
Step  99000: train WeightedCategoryCrossEntropy |  1.04122782
Step  99000: eval  WeightedCategoryCrossEntropy |  1.35543263
Step  99000: eval                      Accuracy |  0.57437052

Step  100000: Ran 1000 train steps in 38.91 secs
Step  100000: train WeightedCategoryCrossEntropy |  1.02983260
Step  100000: eval  WeightedCategoryCrossEntropy |  1.37780102
Step  100000: eval                      Accuracy |  0.57324133

Step  101000: Ran 1000 train steps in 38.87 secs
Step  101000: train WeightedCategoryCrossEntropy |  1.01030552
Step  101000: eval  WeightedCategoryCrossEntropy |  1.36497653
Step  101000: eval                      Accuracy |  0.58740668

Step  102000: Ran 1000 train steps in 38.90 secs
Step  102000: train WeightedCategoryCrossEntropy |  1.02731681
Step  102000: eval  WeightedCategoryCrossEntropy |  1.35321331
Step  102000: eval                      Accuracy |  0.57775164

Step  103000: Ran 1000 train steps in 38.91 secs
Step  103000: train WeightedCategoryCrossEntropy |  1.03641915
Step  103000: eval  WeightedCategoryCrossEntropy |  1.34763209
Step  103000: eval                      Accuracy |  0.58446699

Step  104000: Ran 1000 train steps in 38.94 secs
Step  104000: train WeightedCategoryCrossEntropy |  1.01956904
Step  104000: eval  WeightedCategoryCrossEntropy |  1.36184053
Step  104000: eval                      Accuracy |  0.57803359

Step  105000: Ran 1000 train steps in 38.89 secs
Step  105000: train WeightedCategoryCrossEntropy |  1.01011324
Step  105000: eval  WeightedCategoryCrossEntropy |  1.38106732
Step  105000: eval                      Accuracy |  0.57777325

Step  106000: Ran 1000 train steps in 38.89 secs
Step  106000: train WeightedCategoryCrossEntropy |  1.02553248
Step  106000: eval  WeightedCategoryCrossEntropy |  1.35610406
Step  106000: eval                      Accuracy |  0.57794044

Step  107000: Ran 1000 train steps in 38.82 secs
Step  107000: train WeightedCategoryCrossEntropy |  1.03704548
Step  107000: eval  WeightedCategoryCrossEntropy |  1.42385058
Step  107000: eval                      Accuracy |  0.56722079

Step  108000: Ran 1000 train steps in 38.95 secs
Step  108000: train WeightedCategoryCrossEntropy |  1.00718296
Step  108000: eval  WeightedCategoryCrossEntropy |  1.31863145
Step  108000: eval                      Accuracy |  0.58128174

Step  109000: Ran 1000 train steps in 38.88 secs
Step  109000: train WeightedCategoryCrossEntropy |  1.01074588
Step  109000: eval  WeightedCategoryCrossEntropy |  1.38885832
Step  109000: eval                      Accuracy |  0.57076645

Step  110000: Ran 1000 train steps in 38.89 secs
Step  110000: train WeightedCategoryCrossEntropy |  1.02346790
Step  110000: eval  WeightedCategoryCrossEntropy |  1.38532333
Step  110000: eval                      Accuracy |  0.56799785

Step  111000: Ran 1000 train steps in 38.91 secs
Step  111000: train WeightedCategoryCrossEntropy |  1.03170466
Step  111000: eval  WeightedCategoryCrossEntropy |  1.43979116
Step  111000: eval                      Accuracy |  0.55651154

Step  112000: Ran 1000 train steps in 38.91 secs
Step  112000: train WeightedCategoryCrossEntropy |  0.99752879
Step  112000: eval  WeightedCategoryCrossEntropy |  1.40813621
Step  112000: eval                      Accuracy |  0.57297881

Step  113000: Ran 1000 train steps in 38.86 secs
Step  113000: train WeightedCategoryCrossEntropy |  1.00867105
Step  113000: eval  WeightedCategoryCrossEntropy |  1.40307196
Step  113000: eval                      Accuracy |  0.57566841

Step  114000: Ran 1000 train steps in 38.90 secs
Step  114000: train WeightedCategoryCrossEntropy |  1.02337575
Step  114000: eval  WeightedCategoryCrossEntropy |  1.44530074
Step  114000: eval                      Accuracy |  0.55467153

Step  115000: Ran 1000 train steps in 38.87 secs
Step  115000: train WeightedCategoryCrossEntropy |  1.03222477
Step  115000: eval  WeightedCategoryCrossEntropy |  1.41283929
Step  115000: eval                      Accuracy |  0.57396744

Step  116000: Ran 1000 train steps in 38.91 secs
Step  116000: train WeightedCategoryCrossEntropy |  0.98707652
Step  116000: eval  WeightedCategoryCrossEntropy |  1.38734619
Step  116000: eval                      Accuracy |  0.57764675

Step  117000: Ran 1000 train steps in 38.88 secs
Step  117000: train WeightedCategoryCrossEntropy |  1.00943744
Step  117000: eval  WeightedCategoryCrossEntropy |  1.35685408
Step  117000: eval                      Accuracy |  0.58032387

Step  118000: Ran 1000 train steps in 38.91 secs
Step  118000: train WeightedCategoryCrossEntropy |  1.02165031
Step  118000: eval  WeightedCategoryCrossEntropy |  1.41391091
Step  118000: eval                      Accuracy |  0.55870849

Step  119000: Ran 1000 train steps in 38.94 secs
Step  119000: train WeightedCategoryCrossEntropy |  1.02332592
Step  119000: eval  WeightedCategoryCrossEntropy |  1.37008909
Step  119000: eval                      Accuracy |  0.58312436

Step  120000: Ran 1000 train steps in 38.87 secs
Step  120000: train WeightedCategoryCrossEntropy |  0.99027425
Step  120000: eval  WeightedCategoryCrossEntropy |  1.39020562
Step  120000: eval                      Accuracy |  0.56893224

Step  121000: Ran 1000 train steps in 38.91 secs
Step  121000: train WeightedCategoryCrossEntropy |  1.01001906
Step  121000: eval  WeightedCategoryCrossEntropy |  1.34898885
Step  121000: eval                      Accuracy |  0.58765940

Step  122000: Ran 1000 train steps in 38.91 secs
Step  122000: train WeightedCategoryCrossEntropy |  1.01810360
Step  122000: eval  WeightedCategoryCrossEntropy |  1.31699550
Step  122000: eval                      Accuracy |  0.59351979

Step  123000: Ran 1000 train steps in 38.94 secs
Step  123000: train WeightedCategoryCrossEntropy |  1.00846207
Step  123000: eval  WeightedCategoryCrossEntropy |  1.36349829
Step  123000: eval                      Accuracy |  0.58220035

Step  124000: Ran 1000 train steps in 38.90 secs
Step  124000: train WeightedCategoryCrossEntropy |  0.99121541
Step  124000: eval  WeightedCategoryCrossEntropy |  1.36115118
Step  124000: eval                      Accuracy |  0.58584205

Step  125000: Ran 1000 train steps in 38.95 secs
Step  125000: train WeightedCategoryCrossEntropy |  1.00830889
Step  125000: eval  WeightedCategoryCrossEntropy |  1.40724500
Step  125000: eval                      Accuracy |  0.56920058

Step  126000: Ran 1000 train steps in 38.90 secs
Step  126000: train WeightedCategoryCrossEntropy |  1.01781940
Step  126000: eval  WeightedCategoryCrossEntropy |  1.36977708
Step  126000: eval                      Accuracy |  0.58009328

Step  127000: Ran 1000 train steps in 38.96 secs
Step  127000: train WeightedCategoryCrossEntropy |  1.00031054
Step  127000: eval  WeightedCategoryCrossEntropy |  1.41326904
Step  127000: eval                      Accuracy |  0.57243240

Step  128000: Ran 1000 train steps in 38.92 secs
Step  128000: train WeightedCategoryCrossEntropy |  0.99219322
Step  128000: eval  WeightedCategoryCrossEntropy |  1.44404384
Step  128000: eval                      Accuracy |  0.57395190

Step  129000: Ran 1000 train steps in 38.99 secs
Step  129000: train WeightedCategoryCrossEntropy |  1.00709093
Step  129000: eval  WeightedCategoryCrossEntropy |  1.41958042
Step  129000: eval                      Accuracy |  0.57267843

Step  130000: Ran 1000 train steps in 38.99 secs
Step  130000: train WeightedCategoryCrossEntropy |  1.01912773
Step  130000: eval  WeightedCategoryCrossEntropy |  1.33912981
Step  130000: eval                      Accuracy |  0.59197128

Step  131000: Ran 1000 train steps in 39.00 secs
Step  131000: train WeightedCategoryCrossEntropy |  0.98723483
Step  131000: eval  WeightedCategoryCrossEntropy |  1.41522125
Step  131000: eval                      Accuracy |  0.57427963

Step  132000: Ran 1000 train steps in 38.94 secs
Step  132000: train WeightedCategoryCrossEntropy |  0.99342090
Step  132000: eval  WeightedCategoryCrossEntropy |  1.41465898
Step  132000: eval                      Accuracy |  0.57029406

Step  133000: Ran 1000 train steps in 38.88 secs
Step  133000: train WeightedCategoryCrossEntropy |  1.00727808
Step  133000: eval  WeightedCategoryCrossEntropy |  1.38130502
Step  133000: eval                      Accuracy |  0.57192655

Step  134000: Ran 1000 train steps in 38.91 secs
Step  134000: train WeightedCategoryCrossEntropy |  1.01677108
Step  134000: eval  WeightedCategoryCrossEntropy |  1.37716194
Step  134000: eval                      Accuracy |  0.57707018

Step  135000: Ran 1000 train steps in 38.98 secs
Step  135000: train WeightedCategoryCrossEntropy |  0.98251414
Step  135000: eval  WeightedCategoryCrossEntropy |  1.43346206
Step  135000: eval                      Accuracy |  0.56802229

Step  136000: Ran 1000 train steps in 38.94 secs
Step  136000: train WeightedCategoryCrossEntropy |  0.99259746
Step  136000: eval  WeightedCategoryCrossEntropy |  1.40438286
Step  136000: eval                      Accuracy |  0.56927029

Step  137000: Ran 1000 train steps in 38.95 secs
Step  137000: train WeightedCategoryCrossEntropy |  1.00365269
Step  137000: eval  WeightedCategoryCrossEntropy |  1.39464525
Step  137000: eval                      Accuracy |  0.56577289

Step  138000: Ran 1000 train steps in 38.94 secs
Step  138000: train WeightedCategoryCrossEntropy |  1.01699519
Step  138000: eval  WeightedCategoryCrossEntropy |  1.38829728
Step  138000: eval                      Accuracy |  0.56793642

Step  139000: Ran 1000 train steps in 38.95 secs
Step  139000: train WeightedCategoryCrossEntropy |  0.97175646
Step  139000: eval  WeightedCategoryCrossEntropy |  1.41113611
Step  139000: eval                      Accuracy |  0.57514930

Step  140000: Ran 1000 train steps in 38.90 secs
Step  140000: train WeightedCategoryCrossEntropy |  0.99368864
Step  140000: eval  WeightedCategoryCrossEntropy |  1.37815968
Step  140000: eval                      Accuracy |  0.57881431

Step  141000: Ran 1000 train steps in 38.89 secs
Step  141000: train WeightedCategoryCrossEntropy |  1.00594318
Step  141000: eval  WeightedCategoryCrossEntropy |  1.37036717
Step  141000: eval                      Accuracy |  0.58198376

Step  142000: Ran 1000 train steps in 38.90 secs
Step  142000: train WeightedCategoryCrossEntropy |  1.00673234
Step  142000: eval  WeightedCategoryCrossEntropy |  1.40482660
Step  142000: eval                      Accuracy |  0.58230907

Step  143000: Ran 1000 train steps in 38.90 secs
Step  143000: train WeightedCategoryCrossEntropy |  0.97389799
Step  143000: eval  WeightedCategoryCrossEntropy |  1.39242669
Step  143000: eval                      Accuracy |  0.58056428

Step  144000: Ran 1000 train steps in 38.92 secs
Step  144000: train WeightedCategoryCrossEntropy |  0.99413979
Step  144000: eval  WeightedCategoryCrossEntropy |  1.41043913
Step  144000: eval                      Accuracy |  0.56678424

Step  145000: Ran 1000 train steps in 38.95 secs
Step  145000: train WeightedCategoryCrossEntropy |  1.00447440
Step  145000: eval  WeightedCategoryCrossEntropy |  1.36656562
Step  145000: eval                      Accuracy |  0.57477281

Step  146000: Ran 1000 train steps in 38.99 secs
Step  146000: train WeightedCategoryCrossEntropy |  0.99580330
Step  146000: eval  WeightedCategoryCrossEntropy |  1.48764821
Step  146000: eval                      Accuracy |  0.55135592

Step  147000: Ran 1000 train steps in 38.92 secs
Step  147000: train WeightedCategoryCrossEntropy |  0.97624487
Step  147000: eval  WeightedCategoryCrossEntropy |  1.40377279
Step  147000: eval                      Accuracy |  0.58196793

Step  148000: Ran 1000 train steps in 38.91 secs
Step  148000: train WeightedCategoryCrossEntropy |  0.99337947
Step  148000: eval  WeightedCategoryCrossEntropy |  1.38602730
Step  148000: eval                      Accuracy |  0.56986465

Step  149000: Ran 1000 train steps in 38.88 secs
Step  149000: train WeightedCategoryCrossEntropy |  1.00641680
Step  149000: eval  WeightedCategoryCrossEntropy |  1.39816805
Step  149000: eval                      Accuracy |  0.57870026

Step  150000: Ran 1000 train steps in 38.92 secs
Step  150000: train WeightedCategoryCrossEntropy |  0.98345733
Step  150000: eval  WeightedCategoryCrossEntropy |  1.42259351
Step  150000: eval                      Accuracy |  0.56833545

Step  151000: Ran 1000 train steps in 38.91 secs
Step  151000: train WeightedCategoryCrossEntropy |  0.97820592
Step  151000: eval  WeightedCategoryCrossEntropy |  1.38016677
Step  151000: eval                      Accuracy |  0.57927004

Step  152000: Ran 1000 train steps in 38.92 secs
Step  152000: train WeightedCategoryCrossEntropy |  0.99465126
Step  152000: eval  WeightedCategoryCrossEntropy |  1.40752935
Step  152000: eval                      Accuracy |  0.57599767

Step  153000: Ran 1000 train steps in 38.91 secs
Step  153000: train WeightedCategoryCrossEntropy |  1.00440490
Step  153000: eval  WeightedCategoryCrossEntropy |  1.38850121
Step  153000: eval                      Accuracy |  0.57887087

Step  154000: Ran 1000 train steps in 38.98 secs
Step  154000: train WeightedCategoryCrossEntropy |  0.97649008
Step  154000: eval  WeightedCategoryCrossEntropy |  1.40402273
Step  154000: eval                      Accuracy |  0.57060033

Step  155000: Ran 1000 train steps in 38.91 secs
Step  155000: train WeightedCategoryCrossEntropy |  0.97934151
Step  155000: eval  WeightedCategoryCrossEntropy |  1.48141162
Step  155000: eval                      Accuracy |  0.56002742

Step  156000: Ran 1000 train steps in 38.92 secs
Step  156000: train WeightedCategoryCrossEntropy |  0.99469137
Step  156000: eval  WeightedCategoryCrossEntropy |  1.36240538
Step  156000: eval                      Accuracy |  0.57810269

Step  157000: Ran 1000 train steps in 38.91 secs
Step  157000: train WeightedCategoryCrossEntropy |  1.00433600
Step  157000: eval  WeightedCategoryCrossEntropy |  1.39899556
Step  157000: eval                      Accuracy |  0.57247500

Step  158000: Ran 1000 train steps in 38.93 secs
Step  158000: train WeightedCategoryCrossEntropy |  0.96986669
Step  158000: eval  WeightedCategoryCrossEntropy |  1.40644030
Step  158000: eval                      Accuracy |  0.57322383

Step  159000: Ran 1000 train steps in 38.92 secs
Step  159000: train WeightedCategoryCrossEntropy |  0.98071331
Step  159000: eval  WeightedCategoryCrossEntropy |  1.44401983
Step  159000: eval                      Accuracy |  0.57154638

Step  160000: Ran 1000 train steps in 38.93 secs
Step  160000: train WeightedCategoryCrossEntropy |  0.99308157
Step  160000: eval  WeightedCategoryCrossEntropy |  1.41375522
Step  160000: eval                      Accuracy |  0.57750905

Step  161000: Ran 1000 train steps in 38.97 secs
Step  161000: train WeightedCategoryCrossEntropy |  1.00366378
Step  161000: eval  WeightedCategoryCrossEntropy |  1.40615169
Step  161000: eval                      Accuracy |  0.57685037

Step  162000: Ran 1000 train steps in 39.03 secs
Step  162000: train WeightedCategoryCrossEntropy |  0.96036094
Step  162000: eval  WeightedCategoryCrossEntropy |  1.40110429
Step  162000: eval                      Accuracy |  0.57392023
Elapsed: 1:04:57.283108
loop = take_two(gru.model, training_generator, evaluation, epochs=10000)

Step   7200: Ran 100 train steps in 49.91 secs
Step   7200: train WeightedCategoryCrossEntropy |  1.40845227
Step   7200: eval  WeightedCategoryCrossEntropy |  1.53364094
Step   7200: eval                      Accuracy |  0.53398244

Step   7300: Ran 100 train steps in 46.69 secs
Step   7300: train WeightedCategoryCrossEntropy |  1.37220216
Step   7300: eval  WeightedCategoryCrossEntropy |  1.42109434
Step   7300: eval                      Accuracy |  0.55498699

Step   7400: Ran 100 train steps in 46.79 secs
Step   7400: train WeightedCategoryCrossEntropy |  1.34160054
Step   7400: eval  WeightedCategoryCrossEntropy |  1.42887247
Step   7400: eval                      Accuracy |  0.54843716

Step   7500: Ran 100 train steps in 46.75 secs
Step   7500: train WeightedCategoryCrossEntropy |  1.33687389
Step   7500: eval  WeightedCategoryCrossEntropy |  1.39091337
Step   7500: eval                      Accuracy |  0.56296345

Step   7600: Ran 100 train steps in 46.73 secs
Step   7600: train WeightedCategoryCrossEntropy |  1.32682574
Step   7600: eval  WeightedCategoryCrossEntropy |  1.36574340
Step   7600: eval                      Accuracy |  0.56962399

Step   7700: Ran 100 train steps in 47.18 secs
Step   7700: train WeightedCategoryCrossEntropy |  1.31113505
Step   7700: eval  WeightedCategoryCrossEntropy |  1.37930723
Step   7700: eval                      Accuracy |  0.56413543

Step   7800: Ran 100 train steps in 46.63 secs
Step   7800: train WeightedCategoryCrossEntropy |  1.30171084
Step   7800: eval  WeightedCategoryCrossEntropy |  1.40999524
Step   7800: eval                      Accuracy |  0.56547354

Step   7900: Ran 100 train steps in 46.62 secs
Step   7900: train WeightedCategoryCrossEntropy |  1.29436350
Step   7900: eval  WeightedCategoryCrossEntropy |  1.33792806
Step   7900: eval                      Accuracy |  0.58449248

Step   8000: Ran 100 train steps in 46.63 secs
Step   8000: train WeightedCategoryCrossEntropy |  1.29799175
Step   8000: eval  WeightedCategoryCrossEntropy |  1.33296335
Step   8000: eval                      Accuracy |  0.57597931

Step   8100: Ran 100 train steps in 46.70 secs
Step   8100: train WeightedCategoryCrossEntropy |  1.28517950
Step   8100: eval  WeightedCategoryCrossEntropy |  1.40022814
Step   8100: eval                      Accuracy |  0.55829932

Step   8200: Ran 100 train steps in 46.64 secs
Step   8200: train WeightedCategoryCrossEntropy |  1.28536940
Step   8200: eval  WeightedCategoryCrossEntropy |  1.37004666
Step   8200: eval                      Accuracy |  0.56932286

Step   8300: Ran 100 train steps in 46.59 secs
Step   8300: train WeightedCategoryCrossEntropy |  1.28937984
Step   8300: eval  WeightedCategoryCrossEntropy |  1.39467760
Step   8300: eval                      Accuracy |  0.55672725

Step   8400: Ran 100 train steps in 46.59 secs
Step   8400: train WeightedCategoryCrossEntropy |  1.28266370
Step   8400: eval  WeightedCategoryCrossEntropy |  1.40646402
Step   8400: eval                      Accuracy |  0.56549414

Step   8500: Ran 100 train steps in 46.58 secs
Step   8500: train WeightedCategoryCrossEntropy |  1.28980207
Step   8500: eval  WeightedCategoryCrossEntropy |  1.35758976
Step   8500: eval                      Accuracy |  0.57382486

Step   8600: Ran 100 train steps in 46.59 secs
Step   8600: train WeightedCategoryCrossEntropy |  1.28626430
Step   8600: eval  WeightedCategoryCrossEntropy |  1.39424094
Step   8600: eval                      Accuracy |  0.55458832

Step   8700: Ran 100 train steps in 46.55 secs
Step   8700: train WeightedCategoryCrossEntropy |  1.27769840
Step   8700: eval  WeightedCategoryCrossEntropy |  1.34323144
Step   8700: eval                      Accuracy |  0.57333910

Step   8800: Ran 100 train steps in 46.56 secs
Step   8800: train WeightedCategoryCrossEntropy |  1.27631617
Step   8800: eval  WeightedCategoryCrossEntropy |  1.36277807
Step   8800: eval                      Accuracy |  0.57450738

Step   8900: Ran 100 train steps in 46.63 secs
Step   8900: train WeightedCategoryCrossEntropy |  1.27718043
Step   8900: eval  WeightedCategoryCrossEntropy |  1.37657404
Step   8900: eval                      Accuracy |  0.56594115

Step   9000: Ran 100 train steps in 46.56 secs
Step   9000: train WeightedCategoryCrossEntropy |  1.27473545
Step   9000: eval  WeightedCategoryCrossEntropy |  1.33857087
Step   9000: eval                      Accuracy |  0.57156471

Step   9100: Ran 100 train steps in 46.60 secs
Step   9100: train WeightedCategoryCrossEntropy |  1.27636838
Step   9100: eval  WeightedCategoryCrossEntropy |  1.32985719
Step   9100: eval                      Accuracy |  0.58792001

Step   9200: Ran 100 train steps in 46.57 secs
Step   9200: train WeightedCategoryCrossEntropy |  1.27704740
Step   9200: eval  WeightedCategoryCrossEntropy |  1.33943196
Step   9200: eval                      Accuracy |  0.57151316

Step   9300: Ran 100 train steps in 46.60 secs
Step   9300: train WeightedCategoryCrossEntropy |  1.27908921
Step   9300: eval  WeightedCategoryCrossEntropy |  1.35788206
Step   9300: eval                      Accuracy |  0.56833035

Step   9400: Ran 100 train steps in 46.59 secs
Step   9400: train WeightedCategoryCrossEntropy |  1.27476656
Step   9400: eval  WeightedCategoryCrossEntropy |  1.37336095
Step   9400: eval                      Accuracy |  0.57279189

Step   9500: Ran 100 train steps in 46.64 secs
Step   9500: train WeightedCategoryCrossEntropy |  1.27277946
Step   9500: eval  WeightedCategoryCrossEntropy |  1.38834250
Step   9500: eval                      Accuracy |  0.55810201

Step   9600: Ran 100 train steps in 46.67 secs
Step   9600: train WeightedCategoryCrossEntropy |  1.26448727
Step   9600: eval  WeightedCategoryCrossEntropy |  1.39491995
Step   9600: eval                      Accuracy |  0.55545733

Step   9700: Ran 100 train steps in 46.71 secs
Step   9700: train WeightedCategoryCrossEntropy |  1.26453817
Step   9700: eval  WeightedCategoryCrossEntropy |  1.31964866
Step   9700: eval                      Accuracy |  0.58797077

Step   9800: Ran 100 train steps in 46.63 secs
Step   9800: train WeightedCategoryCrossEntropy |  1.26623130
Step   9800: eval  WeightedCategoryCrossEntropy |  1.33691669
Step   9800: eval                      Accuracy |  0.58117094

Step   9900: Ran 100 train steps in 46.61 secs
Step   9900: train WeightedCategoryCrossEntropy |  1.26877284
Step   9900: eval  WeightedCategoryCrossEntropy |  1.35668564
Step   9900: eval                      Accuracy |  0.56906497

Step  10000: Ran 100 train steps in 46.91 secs
Step  10000: train WeightedCategoryCrossEntropy |  1.27724636
Step  10000: eval  WeightedCategoryCrossEntropy |  1.37475316
Step  10000: eval                      Accuracy |  0.57083255

Step  10100: Ran 100 train steps in 46.64 secs
Step  10100: train WeightedCategoryCrossEntropy |  1.27599573
Step  10100: eval  WeightedCategoryCrossEntropy |  1.39496668
Step  10100: eval                      Accuracy |  0.55946493

Step  10200: Ran 100 train steps in 46.66 secs
Step  10200: train WeightedCategoryCrossEntropy |  1.26500976
Step  10200: eval  WeightedCategoryCrossEntropy |  1.30219173
Step  10200: eval                      Accuracy |  0.58777571

Step  10300: Ran 100 train steps in 46.64 secs
Step  10300: train WeightedCategoryCrossEntropy |  1.26295793
Step  10300: eval  WeightedCategoryCrossEntropy |  1.34939114
Step  10300: eval                      Accuracy |  0.58265235

Step  10400: Ran 100 train steps in 46.71 secs
Step  10400: train WeightedCategoryCrossEntropy |  1.26094663
Step  10400: eval  WeightedCategoryCrossEntropy |  1.34398154
Step  10400: eval                      Accuracy |  0.58220708

Step  10500: Ran 100 train steps in 46.64 secs
Step  10500: train WeightedCategoryCrossEntropy |  1.26208460
Step  10500: eval  WeightedCategoryCrossEntropy |  1.33290792
Step  10500: eval                      Accuracy |  0.57700493

Step  10600: Ran 100 train steps in 46.64 secs
Step  10600: train WeightedCategoryCrossEntropy |  1.26667988
Step  10600: eval  WeightedCategoryCrossEntropy |  1.35851014
Step  10600: eval                      Accuracy |  0.56506201

Step  10700: Ran 100 train steps in 46.68 secs
Step  10700: train WeightedCategoryCrossEntropy |  1.26337409
Step  10700: eval  WeightedCategoryCrossEntropy |  1.33711513
Step  10700: eval                      Accuracy |  0.56967231

Step  10800: Ran 100 train steps in 46.71 secs
Step  10800: train WeightedCategoryCrossEntropy |  1.26840901
Step  10800: eval  WeightedCategoryCrossEntropy |  1.34306133
Step  10800: eval                      Accuracy |  0.57760129

Step  10900: Ran 100 train steps in 46.68 secs
Step  10900: train WeightedCategoryCrossEntropy |  1.26851952
Step  10900: eval  WeightedCategoryCrossEntropy |  1.36890825
Step  10900: eval                      Accuracy |  0.56626668

Step  11000: Ran 100 train steps in 46.60 secs
Step  11000: train WeightedCategoryCrossEntropy |  1.26771557
Step  11000: eval  WeightedCategoryCrossEntropy |  1.33610710
Step  11000: eval                      Accuracy |  0.58137830

Step  11100: Ran 100 train steps in 46.61 secs
Step  11100: train WeightedCategoryCrossEntropy |  1.26955628
Step  11100: eval  WeightedCategoryCrossEntropy |  1.31183930
Step  11100: eval                      Accuracy |  0.58702825

Step  11200: Ran 100 train steps in 46.51 secs
Step  11200: train WeightedCategoryCrossEntropy |  1.25960994
Step  11200: eval  WeightedCategoryCrossEntropy |  1.35415089
Step  11200: eval                      Accuracy |  0.57303894

Step  11300: Ran 100 train steps in 46.57 secs
Step  11300: train WeightedCategoryCrossEntropy |  1.26471293
Step  11300: eval  WeightedCategoryCrossEntropy |  1.35277263
Step  11300: eval                      Accuracy |  0.57152595

Step  11400: Ran 100 train steps in 46.53 secs
Step  11400: train WeightedCategoryCrossEntropy |  1.25756633
Step  11400: eval  WeightedCategoryCrossEntropy |  1.30689363
Step  11400: eval                      Accuracy |  0.58587994

Step  11500: Ran 100 train steps in 46.72 secs
Step  11500: train WeightedCategoryCrossEntropy |  1.26152885
Step  11500: eval  WeightedCategoryCrossEntropy |  1.35160565
Step  11500: eval                      Accuracy |  0.57004086

Step  11600: Ran 100 train steps in 46.56 secs
Step  11600: train WeightedCategoryCrossEntropy |  1.23939836
Step  11600: eval  WeightedCategoryCrossEntropy |  1.31620030
Step  11600: eval                      Accuracy |  0.57880658

Step  11700: Ran 100 train steps in 46.58 secs
Step  11700: train WeightedCategoryCrossEntropy |  1.23543918
Step  11700: eval  WeightedCategoryCrossEntropy |  1.36910570
Step  11700: eval                      Accuracy |  0.56298707

Step  11800: Ran 100 train steps in 46.54 secs
Step  11800: train WeightedCategoryCrossEntropy |  1.24286366
Step  11800: eval  WeightedCategoryCrossEntropy |  1.36233894
Step  11800: eval                      Accuracy |  0.57290844

Step  11900: Ran 100 train steps in 46.57 secs
Step  11900: train WeightedCategoryCrossEntropy |  1.23808372
Step  11900: eval  WeightedCategoryCrossEntropy |  1.35872213
Step  11900: eval                      Accuracy |  0.57846189

Step  12000: Ran 100 train steps in 46.53 secs
Step  12000: train WeightedCategoryCrossEntropy |  1.23670936
Step  12000: eval  WeightedCategoryCrossEntropy |  1.32247432
Step  12000: eval                      Accuracy |  0.57690984

Step  12100: Ran 100 train steps in 46.55 secs
Step  12100: train WeightedCategoryCrossEntropy |  1.24116862
Step  12100: eval  WeightedCategoryCrossEntropy |  1.34740726
Step  12100: eval                      Accuracy |  0.57368577

Step  12200: Ran 100 train steps in 46.56 secs
Step  12200: train WeightedCategoryCrossEntropy |  1.23870814
Step  12200: eval  WeightedCategoryCrossEntropy |  1.34412030
Step  12200: eval                      Accuracy |  0.57441618

Step  12300: Ran 100 train steps in 46.51 secs
Step  12300: train WeightedCategoryCrossEntropy |  1.23964739
Step  12300: eval  WeightedCategoryCrossEntropy |  1.31778471
Step  12300: eval                      Accuracy |  0.59404006

Step  12400: Ran 100 train steps in 46.57 secs
Step  12400: train WeightedCategoryCrossEntropy |  1.23977387
Step  12400: eval  WeightedCategoryCrossEntropy |  1.36329297
Step  12400: eval                      Accuracy |  0.56865372

Step  12500: Ran 100 train steps in 46.56 secs
Step  12500: train WeightedCategoryCrossEntropy |  1.24057162
Step  12500: eval  WeightedCategoryCrossEntropy |  1.32396106
Step  12500: eval                      Accuracy |  0.57749913

Step  12600: Ran 100 train steps in 46.57 secs
Step  12600: train WeightedCategoryCrossEntropy |  1.23996282
Step  12600: eval  WeightedCategoryCrossEntropy |  1.35980467
Step  12600: eval                      Accuracy |  0.57681503

Step  12700: Ran 100 train steps in 46.53 secs
Step  12700: train WeightedCategoryCrossEntropy |  1.23197782
Step  12700: eval  WeightedCategoryCrossEntropy |  1.35620030
Step  12700: eval                      Accuracy |  0.56576115

Step  12800: Ran 100 train steps in 46.54 secs
Step  12800: train WeightedCategoryCrossEntropy |  1.23929477
Step  12800: eval  WeightedCategoryCrossEntropy |  1.32664406
Step  12800: eval                      Accuracy |  0.57836610

Step  12900: Ran 100 train steps in 46.53 secs
Step  12900: train WeightedCategoryCrossEntropy |  1.24684954
Step  12900: eval  WeightedCategoryCrossEntropy |  1.35356160
Step  12900: eval                      Accuracy |  0.57247027

Step  13000: Ran 100 train steps in 46.54 secs
Step  13000: train WeightedCategoryCrossEntropy |  1.23555624
Step  13000: eval  WeightedCategoryCrossEntropy |  1.30849167
Step  13000: eval                      Accuracy |  0.58658669

Step  13100: Ran 100 train steps in 46.54 secs
Step  13100: train WeightedCategoryCrossEntropy |  1.23514199
Step  13100: eval  WeightedCategoryCrossEntropy |  1.32829968
Step  13100: eval                      Accuracy |  0.57877260

Step  13200: Ran 100 train steps in 46.57 secs
Step  13200: train WeightedCategoryCrossEntropy |  1.24334764
Step  13200: eval  WeightedCategoryCrossEntropy |  1.32007960
Step  13200: eval                      Accuracy |  0.58390542

Step  13300: Ran 100 train steps in 46.50 secs
Step  13300: train WeightedCategoryCrossEntropy |  1.23758221
Step  13300: eval  WeightedCategoryCrossEntropy |  1.33836234
Step  13300: eval                      Accuracy |  0.57748077

Step  13400: Ran 100 train steps in 46.53 secs
Step  13400: train WeightedCategoryCrossEntropy |  1.23699570
Step  13400: eval  WeightedCategoryCrossEntropy |  1.28857458
Step  13400: eval                      Accuracy |  0.59427991

Step  13500: Ran 100 train steps in 46.56 secs
Step  13500: train WeightedCategoryCrossEntropy |  1.24157882
Step  13500: eval  WeightedCategoryCrossEntropy |  1.33362718
Step  13500: eval                      Accuracy |  0.57985461

Step  13600: Ran 100 train steps in 46.57 secs
Step  13600: train WeightedCategoryCrossEntropy |  1.24225903
Step  13600: eval  WeightedCategoryCrossEntropy |  1.33033669
Step  13600: eval                      Accuracy |  0.58468521

Step  13700: Ran 100 train steps in 46.56 secs
Step  13700: train WeightedCategoryCrossEntropy |  1.24346125
Step  13700: eval  WeightedCategoryCrossEntropy |  1.31333911
Step  13700: eval                      Accuracy |  0.58795037

Step  13800: Ran 100 train steps in 46.56 secs
Step  13800: train WeightedCategoryCrossEntropy |  1.24078453
Step  13800: eval  WeightedCategoryCrossEntropy |  1.34135834
Step  13800: eval                      Accuracy |  0.57634938

Step  13900: Ran 100 train steps in 46.67 secs
Step  13900: train WeightedCategoryCrossEntropy |  1.23734236
Step  13900: eval  WeightedCategoryCrossEntropy |  1.36791305
Step  13900: eval                      Accuracy |  0.56584058

Step  14000: Ran 100 train steps in 46.56 secs
Step  14000: train WeightedCategoryCrossEntropy |  1.23029447
Step  14000: eval  WeightedCategoryCrossEntropy |  1.36097904
Step  14000: eval                      Accuracy |  0.56552213

Step  14100: Ran 100 train steps in 46.66 secs
Step  14100: train WeightedCategoryCrossEntropy |  1.23631048
Step  14100: eval  WeightedCategoryCrossEntropy |  1.32405988
Step  14100: eval                      Accuracy |  0.57309214

Step  14200: Ran 100 train steps in 46.68 secs
Step  14200: train WeightedCategoryCrossEntropy |  1.22712052
Step  14200: eval  WeightedCategoryCrossEntropy |  1.37027800
Step  14200: eval                      Accuracy |  0.55948075

Step  14300: Ran 100 train steps in 46.63 secs
Step  14300: train WeightedCategoryCrossEntropy |  1.23570395
Step  14300: eval  WeightedCategoryCrossEntropy |  1.30359221
Step  14300: eval                      Accuracy |  0.59196734

Step  14400: Ran 100 train steps in 46.63 secs
Step  14400: train WeightedCategoryCrossEntropy |  1.23788667
Step  14400: eval  WeightedCategoryCrossEntropy |  1.30524611
Step  14400: eval                      Accuracy |  0.58691663

Step  14500: Ran 100 train steps in 46.69 secs
Step  14500: train WeightedCategoryCrossEntropy |  1.23419011
Step  14500: eval  WeightedCategoryCrossEntropy |  1.36804922
Step  14500: eval                      Accuracy |  0.56866386

Step  14600: Ran 100 train steps in 46.65 secs
Step  14600: train WeightedCategoryCrossEntropy |  1.23835301
Step  14600: eval  WeightedCategoryCrossEntropy |  1.29339818
Step  14600: eval                      Accuracy |  0.59275184

Step  14700: Ran 100 train steps in 46.65 secs
Step  14700: train WeightedCategoryCrossEntropy |  1.23351562
Step  14700: eval  WeightedCategoryCrossEntropy |  1.32991219
Step  14700: eval                      Accuracy |  0.58760637

Step  14800: Ran 100 train steps in 46.64 secs
Step  14800: train WeightedCategoryCrossEntropy |  1.23453915
Step  14800: eval  WeightedCategoryCrossEntropy |  1.33311164
Step  14800: eval                      Accuracy |  0.57431032

Step  14900: Ran 100 train steps in 46.68 secs
Step  14900: train WeightedCategoryCrossEntropy |  1.23706901
Step  14900: eval  WeightedCategoryCrossEntropy |  1.34093809
Step  14900: eval                      Accuracy |  0.57359574

Step  15000: Ran 100 train steps in 46.61 secs
Step  15000: train WeightedCategoryCrossEntropy |  1.23998272
Step  15000: eval  WeightedCategoryCrossEntropy |  1.33679171
Step  15000: eval                      Accuracy |  0.57252198

Step  15100: Ran 100 train steps in 46.58 secs
Step  15100: train WeightedCategoryCrossEntropy |  1.23732710
Step  15100: eval  WeightedCategoryCrossEntropy |  1.29972788
Step  15100: eval                      Accuracy |  0.58468580

Step  15200: Ran 100 train steps in 46.60 secs
Step  15200: train WeightedCategoryCrossEntropy |  1.23871386
Step  15200: eval  WeightedCategoryCrossEntropy |  1.35088738
Step  15200: eval                      Accuracy |  0.57375431

Step  15300: Ran 100 train steps in 46.73 secs
Step  15300: train WeightedCategoryCrossEntropy |  1.23521864
Step  15300: eval  WeightedCategoryCrossEntropy |  1.30088254
Step  15300: eval                      Accuracy |  0.58499869

Step  15400: Ran 100 train steps in 46.65 secs
Step  15400: train WeightedCategoryCrossEntropy |  1.21270466
Step  15400: eval  WeightedCategoryCrossEntropy |  1.32416697
Step  15400: eval                      Accuracy |  0.58676630

Step  15500: Ran 100 train steps in 46.60 secs
Step  15500: train WeightedCategoryCrossEntropy |  1.20742071
Step  15500: eval  WeightedCategoryCrossEntropy |  1.31221966
Step  15500: eval                      Accuracy |  0.57679959

Step  15600: Ran 100 train steps in 46.54 secs
Step  15600: train WeightedCategoryCrossEntropy |  1.21754849
Step  15600: eval  WeightedCategoryCrossEntropy |  1.35318093
Step  15600: eval                      Accuracy |  0.57858366

Step  15700: Ran 100 train steps in 46.59 secs
Step  15700: train WeightedCategoryCrossEntropy |  1.20770407
Step  15700: eval  WeightedCategoryCrossEntropy |  1.33204349
Step  15700: eval                      Accuracy |  0.57040226

Step  15800: Ran 100 train steps in 46.58 secs
Step  15800: train WeightedCategoryCrossEntropy |  1.21227086
Step  15800: eval  WeightedCategoryCrossEntropy |  1.32108204
Step  15800: eval                      Accuracy |  0.58142904

Step  15900: Ran 100 train steps in 46.66 secs
Step  15900: train WeightedCategoryCrossEntropy |  1.20630026
Step  15900: eval  WeightedCategoryCrossEntropy |  1.34532928
Step  15900: eval                      Accuracy |  0.57363081

Step  16000: Ran 100 train steps in 46.58 secs
Step  16000: train WeightedCategoryCrossEntropy |  1.21732092
Step  16000: eval  WeightedCategoryCrossEntropy |  1.34888089
Step  16000: eval                      Accuracy |  0.57829400

Step  16100: Ran 100 train steps in 46.57 secs
Step  16100: train WeightedCategoryCrossEntropy |  1.20914495
Step  16100: eval  WeightedCategoryCrossEntropy |  1.34065656
Step  16100: eval                      Accuracy |  0.57866746

Step  16200: Ran 100 train steps in 46.57 secs
Step  16200: train WeightedCategoryCrossEntropy |  1.21117663
Step  16200: eval  WeightedCategoryCrossEntropy |  1.32027900
Step  16200: eval                      Accuracy |  0.58533911

Step  16300: Ran 100 train steps in 46.58 secs
Step  16300: train WeightedCategoryCrossEntropy |  1.21760499
Step  16300: eval  WeightedCategoryCrossEntropy |  1.30371308
Step  16300: eval                      Accuracy |  0.59620357

Step  16400: Ran 100 train steps in 46.52 secs
Step  16400: train WeightedCategoryCrossEntropy |  1.20953822
Step  16400: eval  WeightedCategoryCrossEntropy |  1.31595250
Step  16400: eval                      Accuracy |  0.58975597

Step  16500: Ran 100 train steps in 46.51 secs
Step  16500: train WeightedCategoryCrossEntropy |  1.22410822
Step  16500: eval  WeightedCategoryCrossEntropy |  1.33057849
Step  16500: eval                      Accuracy |  0.58313890

Step  16600: Ran 100 train steps in 46.57 secs
Step  16600: train WeightedCategoryCrossEntropy |  1.21633768
Step  16600: eval  WeightedCategoryCrossEntropy |  1.34370232
Step  16600: eval                      Accuracy |  0.56324571

Step  16700: Ran 100 train steps in 46.50 secs
Step  16700: train WeightedCategoryCrossEntropy |  1.21109343
Step  16700: eval  WeightedCategoryCrossEntropy |  1.34736327
Step  16700: eval                      Accuracy |  0.55796552

Step  16800: Ran 100 train steps in 46.53 secs
Step  16800: train WeightedCategoryCrossEntropy |  1.22027659
Step  16800: eval  WeightedCategoryCrossEntropy |  1.34284500
Step  16800: eval                      Accuracy |  0.58001840

Step  16900: Ran 100 train steps in 46.49 secs
Step  16900: train WeightedCategoryCrossEntropy |  1.21650743
Step  16900: eval  WeightedCategoryCrossEntropy |  1.31663891
Step  16900: eval                      Accuracy |  0.58754251

Step  17000: Ran 100 train steps in 46.52 secs
Step  17000: train WeightedCategoryCrossEntropy |  1.21804380
Step  17000: eval  WeightedCategoryCrossEntropy |  1.32078075
Step  17000: eval                      Accuracy |  0.57559681

Step  17100: Ran 100 train steps in 46.54 secs
Step  17100: train WeightedCategoryCrossEntropy |  1.22012901
Step  17100: eval  WeightedCategoryCrossEntropy |  1.28926949
Step  17100: eval                      Accuracy |  0.59518562

It looks like it's stuck.

Plotting Accuracy

frame = pandas.DataFrame(loop.history.get("eval", "metrics/Accuracy"),
                         columns="Batch Accuracy".split())
maximum = frame.loc[frame.Accuracy.idxmax()]
vline = holoviews.VLine(maximum.Batch).opts(opts.VLine(color=PLOT.red))
hline = holoviews.HLine(maximum.Accuracy).opts(opts.HLine(color=PLOT.red))
line = frame.hvplot(x="Batch", y="Accuracy").opts(opts.Curve(color=PLOT.blue))

plot = (line * hline * vline).opts(
                                   width=PLOT.width, height=PLOT.height, title="Evaluation Batch Accuracy",
                                   )
output = Embed(plot=plot, file_name="evaluation_accuracy")()
print(output)

Figure Missing

Plotting Loss

frame = pandas.DataFrame(loop.history.get("eval", "metrics/WeightedCategoryCrossEntropy")
                         , columns="Batch Loss".split())
minimum = frame.loc[frame.Loss.idxmin()]
vline = holoviews.VLine(minimum.Batch).opts(opts.VLine(color=PLOT.red))
hline = holoviews.HLine(minimum.Loss).opts(opts.HLine(color=PLOT.red))
line = frame.hvplot(x="Batch", y="Loss").opts(opts.Curve(color=PLOT.blue))

plot = (line * hline * vline).opts(
                                   width=PLOT.width, height=PLOT.height, title="Evaluation Batch Cross Entropy",
                                   )
output = Embed(plot=plot, file_name="evaluation_cross_entropy")()
print(output)
: :

Figure Missing

:

Well, it looks like it's getting worse, not better. I'm probably overfitting. I guess this model isn't good enough to do better.

Deep N-Grams: Creating the Model

Defining the GRU Model

We're going to build a GRU model using trax. We'll do this by passing in "layers" to the Serial class:

  • Serial: Class that applies layers serially (by function composition).
    • You can pass in the layers as arguments to Serial, separated by commas.
    • For example: Serial(Embeddings(...), Mean(...), Dense(...), LogSoftmax(...))

These are the layers that we'll be using:

  • ShiftRight: A layer that adds padding to shift the input. (note that this is one of the Trax methods that has re-named the arguments)
    • ShiftRight(n_positions=1, mode'train')= layer to shift the tensor to the right n_positions times
    • Here in the exercise you only need to specify the mode and not worry about n_positions
  • Embedding: Initializes the embedding layer which maps tokens/IDs to vectors
    • Embedding(vocab_size, d_feature). In this case it is the size of the vocabulary by the dimension of the model.
    • vocab_size is the number of unique words in the given vocabulary.
    • d_feature is the number of elements in the word embedding (some choices for a word embedding size range from 150 to 300, for example).
  • GRU: The Trax GRU layer.
  • Dense: A dense (fully-connected) layer.
    • Dense(n_units): The parameter n_units is the number of units chosen for this dense layer.
  • LogSoftmax: Log of the output probabilities.
    • Here, you don't need to set any parameters for LogSoftMax().

Imports

# pypi
from trax import layers

Middle

The GRU Model

def GRULM(vocab_size: int=256, d_model: int=512, n_layers: int=2, mode:str='train') -> layers.Serial:
    """Returns a GRU language model.

    Args:
       vocab_size (int, optional): Size of the vocabulary. Defaults to 256.
       d_model (int, optional): Depth of embedding (n_units in the GRU cell). Defaults to 512.
       n_layers (int, optional): Number of GRU layers. Defaults to 2.
       mode (str, optional): 'train', 'eval' or 'predict', predict mode is for fast inference. Defaults to "train".

    Returns:
       trax.layers.combinators.Serial: A GRU language model as a layer that maps from a tensor of tokens to activations over a vocab set.
    """
    model = layers.Serial(
        # the ``n_shifts`` argument seems to have changed to ``n_positions``,
        # don't use it remain be backwards compatible
        layers.ShiftRight(1, mode=mode),
        layers.Embedding(vocab_size, d_model),
        *[layers.GRU(d_model) for unit in range(n_layers)],
        layers.Dense(vocab_size),
        layers.LogSoftmax()
    )
    return model

Will It Build?

model = GRULM()
print(model)
Serial[
  Serial[
    ShiftRight(1)
  ]
  Embedding_256_512
  GRU_512
  GRU_512
  Dense_256
  LogSoftmax
]

Saving it for Later

It seems a little goofy to do this, but since I might forget some of the values, might as well.

Imports

# from pypi
from trax import layers

import attr

Model Builder

@attr.s(auto_attribs=True)
class GRUModel:
    """Builds the layers for the GRU model

    Args:
     shift_positions: amount of padding to add to the front of input
     vocabulary_size: the size of our learned vocabulary
     model_dimensions: the GRU and Embeddings dimensions
     gru_layers: how many GRU layers to create
     mode: train, eval, or predict
    """
    shift_positions: int=1
    vocabulary_size: int=256
    model_dimensions: int=512
    gru_layers: int=2
    mode: str="train"
    _model: layers.Serial=None
  • The Model
    @property
    def model(self) -> layers.Serial:
        """The GRU Model"""
        if self._model is None:
            self._model = layers.Serial(
                layers.ShiftRight(self.shift_positions, mode=self.mode),
                layers.Embedding(self.vocabulary_size, self.model_dimensions),
                *[layers.GRU(self.model_dimensions)
                  for gru_layer in range(self.gru_layers)],
                layers.Dense(self.vocabulary_size),
                layers.LogSoftmax()
            )
        return self._model
    

Check It Out

from neurotic.nlp.deep_rnn import GRUModel

gru = GRUModel()
print(gru.model)
Serial[
  Serial[
    ShiftRight(1)
  ]
  Embedding_256_512
  GRU_512
  GRU_512
  Dense_256
  LogSoftmax
]

Deep N-Grams: Loading the Data

Text to Tensor

In this section we're going to load the text data and transform it into tensors.

Imports

# python
from pathlib import Path

import os

# pypi
from dotenv import load_dotenv
from expects import (be_true,
                     contain_exactly,
                     equal,
                     expect)

Set Up

The path to the data is kept in a .env file so we'll load it into the environment here.

load_dotenv("posts/nlp/.env", override=True)
data_path = Path(os.environ["SHAKESPEARE"]).expanduser()
expect(data_path.is_dir()).to(be_true)

Middle

Loading the Data

We're going to be using the plays of Shakespeare. Unlike previously, this data source has them in separate files so we'll have to load each one separately. We're going to be generating characters, not words, so each character has to be given an integer ID. We'll use the Unicode values given to us by the built-in ord function.

lines = []
for filename in data_path.glob("*.txt"):
    with filename.open() as play:
        cleaned = (line.strip() for line in play)
        lines += [line for line in cleaned if line]

This only cleans out the leading and trailing whitespace, there are other things like tabs still in there.

line_count = len(lines)
print(f"Number of lines: {line_count:,}")
print(f"Sample line at position 0: {lines[0]}")
print(f"Sample line at position 999: {lines[999]}")
Number of lines: 125,097
Sample line at position 0: king john
Sample line at position 999: as it makes harmful all that speak of it.

To make this a little easier, we'll convert all characters to lowercase. This way, for example, the model only needs to predict the likelihood that a letter is 'a' and not decide between uppercase 'A' and lowercase 'a'.

lines = [line.lower() for line in lines]

new_line_count = len(lines)
expect(new_line_count).to(equal(line_count))
print(f"Number of lines: {new_line_count:,}")
print(f"Sample line at position 0: {lines[0]}")
print(f"Sample line at position 999: {lines[999]}")
Number of lines: 125,097
Sample line at position 0: king john
Sample line at position 999: as it makes harmful all that speak of it.

Once again, we're gong to do a strait split to create the training and validation data instead of using randomization.

SPLIT = 1000
validation = lines[-SPLIT:]
training = lines[:-SPLIT]

print(f"Number of lines for training: {len(training):,}")
print(f"Number of lines for validation: {len(validation):,}")
Number of lines for training: 124,097
Number of lines for validation: 1,000

To Tensors

Like I mentioned before, we're going to use python's ord function to convert the letters to integers.

for character in "abc xyz123":
    print(f"{character}: {ord(character)}")
a: 97
b: 98
c: 99
 : 32
x: 120
y: 121
z: 122
1: 49
2: 50
3: 51
def line_to_tensor(line: str, EOS_int: int=1) -> list:
    """Turns a line of text into a tensor

    Args:
     line: A single line of text.
     EOS_int: End-of-sentence integer. Defaults to 1.

    Returns:
     a list of integers (unicode values) for the characters in the ``line``.
    """
    tensor = []
    # for each character:
    for c in line:

        # convert to unicode int
        c_int = ord(c)

        # append the unicode integer to the tensor list
        tensor.append(c_int)

    # include the end-of-sentence integer
    tensor.append(EOS_int)
    return tensor

Test the Output

actual = line_to_tensor('abc xyz')
expected = [97, 98, 99, 32, 120, 121, 122, 1]

expect(actual).to(contain_exactly(*expected))

Bundle It Up

This is going to be needed in future posts so I'm going to put it in a class.

Imports

# python
from pathlib import Path

import os

# pypi
from dotenv import load_dotenv

import attr

The Data Loader

@attr.s(auto_attribs=True)
class DataLoader:
    """Load the data and convert it to 'tensors'

    Args:
     env_path: the path to the env file (as a string)
     env_key: the environmental variable with the path to the data
     validation_size: number for the validation set
     end_of_sentence: integer to use to indicate the end of a sentence
    """
    env_path: str="posts/nlp/.env"
    env_key: str="SHAKESPEARE"
    validation_size: int=1000
    end_of_sentence: int=1
    _data_path: Path=None
    _lines: list=None
    _training: list=None
    _validation: list=None

The Data Path

@property
def data_path(self) -> Path:
    """Loads the dotenv and converts the path

    Raises:
     assertion error if path doesn't exist
    """
    if self._data_path is None:
        load_dotenv(self.env_path, override=True)
        self._data_path = Path(os.environ[self.env_key]).expanduser()
        assert self.data_path.is_dir()
    return self._data_path

The Lines

@property
def lines(self) -> list:
    """The lines of text-data"""
    if self._lines is None:
        self._lines = []
        for filename in self.data_path.glob("*.txt"):
            with filename.open() as play:
                cleaned = (line.strip() for line in play)
                self._lines += [line.lower() for line in cleaned if line]
    return self._lines

The Training Set

@property
def training(self) -> list:
    """Subset of the lines for training"""
    if self._training is None:
        self._training = self.lines[:-self.validation_size]
    return self._training

The Validation Set

@property
def validation(self) -> list:
    """The validation subset of the lines"""
    if self._validation is None:
        self._validation = self.lines[-self.validation_size:]
    return self._validation

To Tensor

def to_tensor(self, line: str) -> list:
    """Converts the line to the unicode value

    Args:
     line: the text to convert
    Returns:
     line converted to unicode integer encodings
    """
    return [ord(character) for character in line] + [self.end_of_sentence]

Check the Data Loader

from neurotic.nlp.deep_rnn.data_loader import DataLoader

loader = DataLoader()

expect(len(loader.lines)).to(equal(line_count))
expect(len(loader.validation)).to(equal(SPLIT))
expect(len(loader.training)).to(equal(line_count - SPLIT))

actual = loader.to_tensor('abc xyz')
expected = [97, 98, 99, 32, 120, 121, 122, 1]

expect(actual).to(contain_exactly(*expected))
for line in loader.lines[:10]:
    print(line)
king john
dramatis personae
king john:
prince henry    son to the king.
arthur  duke of bretagne, nephew to the king.
the earl of
pembroke        (pembroke:)
the earl of essex       (essex:)
the earl of
salisbury       (salisbury:)

Deep N-Grams

Deep N-Grams

This is an exploration of Recurrent Neural Networks (RNN) using trax. We're going to predict the next set of characters in a sentence given the previous characters.

Since this is so long I'm going to break it up into separate posts.

First up: - Loading the Data.