Tensorflow Docker Setup

Table of Contents

Beginning

I recently re-started using tensorflow and the python interpreter kept crashing. It appears that they compiled the latest version to require AVX2 and the server I was using has AVX but not AVX2. I couldn't find any documentation about this requirement, but running the code on a different machine that has both AVX and AVX2 got rid of the problem. This might be a transient problem, as the nightly build doesn't crash on either machine, but trying to run the nightly build with other code is a nightmare as it seems that every framework related to tensorflow tries to revert the version back to the broken one, so I gave up and changed machines. The process of setting up cuda and tensorflow over and over again proved difficult, as there's different ways to do it (through apt, using nvidia installers, building from source) and each presents a different problem. The version apt installs, for instance puts the folders in places the tensorflow configure.py file can't figure out (if you build tensorflow from source) and using the nvidia debian package for cudnn left my packages in a broken state, as it was trying to install something that then broke another packages requirements… Anyway, I'm going to try and avoid building tensorflow from source and run everything from docker containers.

Setting Up

I don't know for sure that this is necessary, but I followed nvidia's docker installation instructions. If nothing else you can use it to check that the setup works. After that I setup tensorflow's container with a dockerfile:

FROM tensorflow/tensorflow:latest-gpu-py3-jupyter
RUN apt-get update && \
        apt-get install openssh-server --yes && \
        echo "Adding neurotic user" && \
        useradd --create-home --shell /bin/bash neurotic
COPY authorized_keys /home/neurotic/.ssh/
ENTRYPOINT service ssh restart && bash

The latest tensorflow container comes with python 2.7 as the default for some reason, and all the dependencies are installed with it in mind so to get python 3 (3.6 as of now) you need to specify the py3 tag like I did in the from line. Additionally I use ssh-forwarding for jupyter kernels so I can work in emacs with them so I installed the ssh-server and also created a non-root user to run jupyter. The last line ENTRYPOINT service ssh restart && bash makes sure the ssh-server is running and opens up a bash shell. To build the container I used this command:

docker build -t neurotic-tensorflow .

This creates an image named neurotic-tensorflow. To run it I use this command:

docker run --gpus all -p 2222:22 --name data-neurotic \
       --mount type=bind,source=$HOME/projects/neurotic-networks,target=/home/neurotic/neurotic-networks \
       --mount type=bind,source=/media/data,target=/home/neurotic/data \
       -it neurotic-tensorflow bash

The --gpus all makes the GPUs available. The -p 2222:22 flag maps the ssh-server in the container to port 2222 on the host. This allows you to ssh into the container using ssh neurotic@localhost -p 2222 without knowing the IP address of the container. You can also grab the IP address and then ssh into it like it's another machine on the network:

docker inspect --format "{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}" data-neurotic

Where data-neurotic is the name given to the container in the docker run command, but the advantage of the port mapping is that:

  • You don't need to know the address of the container if you are on the host machine.
  • You can ssh into the container from another machine by substituting the host's IP address for localhost in the ssh command

The mount options mount some folders into the container so we can share files.

Once you've run it you can restart it at any time using:

docker start data-neurotic

And if you need to run something as root you can attach the running container.

docker attach data-neurotic

NOTE: The python 3 container has cuda 10.1 installed but the latest version of tensorflow expects 11.0 - and tensorflow seems to use hard-coded names. So to make it work you either have to upgrade cuda or symlink the file and rename it to look like the newer version.

ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudart.so.10.1 /usr/lib/x86_64-linux-gnu/libcudart.so.11.0

Tensorflow dependencies are incredibly convoluted and broken all over the place.

Sentiment Analysis: Testing the Model

Beginning

Having trained our Deep Learning model for Sentiment Analysis previously we're now going to test how well it did.

Imports

# python
from argparse import Namespace
from functools import partial
from pathlib import Path

# pypi
import nltk
import trax.fastmath.numpy as numpy
import trax.layers as trax_layers

# this project
from neurotic.nlp.twitter.sentiment_network import SentimentNetwork
from neurotic.nlp.twitter.tensor_generator import TensorBuilder, TensorGenerator

Set Up

Download

This is because of all the trouble getting trax and tensorflow working with CUDA means I have to keep re-building the Docker container I'm using.

data_path = Path("~/data/datasets/nltk_data/").expanduser()
nltk.download("twitter_samples", download_dir=str(data_path))

The Data Generators

BATCH_SIZE = 16
converter = TensorBuilder()
train_generator = partial(TensorGenerator, converter,
                                     positive_data=converter.positive_training,
                                     negative_data=converter.negative_training,
                                     batch_size=BATCH_SIZE)
valid_generator=partial(TensorGenerator,
                          converter,
                          positive_data=converter.positive_validation,
                          negative_data=converter.negative_validation,
                          batch_size=BATCH_SIZE)

TRAINING_GENERATOR=train_generator()
VALIDATION_GENERATOR = valid_generator()
SIZE_OF_VOCABULARY = len(converter.vocabulary)
TRAINING_LOOPS = 100

OUTPUT_PATH = Path("~/models").expanduser()
if not OUTPUT_PATH.is_dir():
    OUTPUT_PATH.mkdir()

The Model Builder

trainer = SentimentNetwork(
    training_generator=TRAINING_GENERATOR,
    validation_generator=VALIDATION_GENERATOR,
    vocabulary_size=SIZE_OF_VOCABULARY,
    training_loops=TRAINING_LOOPS,
    output_path=OUTPUT_PATH)
trainer.fit()
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

Step    110: Ran 10 train steps in 4.89 secs
Step    110: train CrossEntropyLoss |  0.00662578
Step    110: eval  CrossEntropyLoss |  0.00139236
Step    110: eval          Accuracy |  1.00000000

Step    120: Ran 10 train steps in 2.61 secs
Step    120: train CrossEntropyLoss |  0.03323080
Step    120: eval  CrossEntropyLoss |  0.00684100
Step    120: eval          Accuracy |  1.00000000

Step    130: Ran 10 train steps in 1.27 secs
Step    130: train CrossEntropyLoss |  0.11124543
Step    130: eval  CrossEntropyLoss |  0.00011413
Step    130: eval          Accuracy |  1.00000000

Step    140: Ran 10 train steps in 0.71 secs
Step    140: train CrossEntropyLoss |  0.03609489
Step    140: eval  CrossEntropyLoss |  0.00000590
Step    140: eval          Accuracy |  1.00000000

Step    150: Ran 10 train steps in 1.92 secs
Step    150: train CrossEntropyLoss |  0.08605278
Step    150: eval  CrossEntropyLoss |  0.00003427
Step    150: eval          Accuracy |  1.00000000

Step    160: Ran 10 train steps in 1.31 secs
Step    160: train CrossEntropyLoss |  0.04926774
Step    160: eval  CrossEntropyLoss |  0.00003597
Step    160: eval          Accuracy |  1.00000000

Step    170: Ran 10 train steps in 1.30 secs
Step    170: train CrossEntropyLoss |  0.00986138
Step    170: eval  CrossEntropyLoss |  0.00026259
Step    170: eval          Accuracy |  1.00000000

Step    180: Ran 10 train steps in 0.76 secs
Step    180: train CrossEntropyLoss |  0.00773767
Step    180: eval  CrossEntropyLoss |  0.00038017
Step    180: eval          Accuracy |  1.00000000

Step    190: Ran 10 train steps in 1.35 secs
Step    190: train CrossEntropyLoss |  0.00555876
Step    190: eval  CrossEntropyLoss |  0.00000706
Step    190: eval          Accuracy |  1.00000000

Step    200: Ran 10 train steps in 0.76 secs
Step    200: train CrossEntropyLoss |  0.00381955
Step    200: eval  CrossEntropyLoss |  0.00000122
Step    200: eval          Accuracy |  1.00000000

The Accuracy

This is from the last post. I havent' figured out how to arrange all the code yet.

def compute_accuracy(preds: numpy.ndarray,
                     y: numpy.ndarray,
                     y_weights: numpy.ndarray) -> tuple:
    """Compute a batch accuracy

    Args: 
       preds: a tensor of shape (dim_batch, output_dim) 
       y: a tensor of shape (dim_batch,) with the true labels
       y_weights: a n.ndarray with the a weight for each example

    Returns: 
       accuracy: a float between 0-1 
       weighted_num_correct (np.float32): Sum of the weighted correct predictions
       sum_weights (np.float32): Sum of the weights
    """
    # Create an array of booleans, 
    # True if the probability of positive sentiment is greater than
    # the probability of negative sentiment
    # else False
    is_pos =  preds[:, 1] > preds[:, 0]

    # convert the array of booleans into an array of np.int32
    is_pos_int = is_pos.astype(numpy.int32)

    # compare the array of predictions (as int32) with the target (labels) of type int32
    correct = is_pos_int == y

    # Count the sum of the weights.
    sum_weights = y_weights.sum()

    # convert the array of correct predictions (boolean) into an arrayof np.float32
    correct_float = correct.astype(numpy.float32)

    # Multiply each prediction with its corresponding weight.
    weighted_correct_float = correct_float.dot(y_weights)

    # Sum up the weighted correct predictions (of type np.float32), to go in the
    # denominator.
    weighted_num_correct = weighted_correct_float.sum()

    # Divide the number of weighted correct predictions by the sum of the
    # weights.
    accuracy = weighted_num_correct/sum_weights

    return accuracy, weighted_num_correct, sum_weights

Middle

Testing the model on Validation Data

Now we'll test our model's prediction accuracy on validation data.

This program will take in a data generator and the model.

  • The generator allows us to get batches of data. You can use it with a for loop:
for batch in iterator: 
   # do something with that batch

batch has dimensions (X, Y, weights).

  • Column 0 corresponds to the tweet as a tensor (input).
  • Column 1 corresponds to its target (actual label, positive or negative sentiment).
  • Column 2 corresponds to the weights associated (example weights)
  • You can feed the tweet into model and it will return the predictions for the batch.
# UNQ_C8 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: test_model
def test_model(generator: TensorGenerator, model: trax_layers.Serial) -> float:
    """Calculate the accuracy of the model

    Args: 
       generator: an iterator instance that provides batches of inputs and targets
       model: a model instance 
    Returns: 
       accuracy: float corresponding to the accuracy
    """

    accuracy = 0.
    total_num_correct = 0
    total_num_pred = 0

    ### START CODE HERE (Replace instances of 'None' with your code) ###
    for batch in generator: 

        # Retrieve the inputs from the batch
        inputs = batch[0]

        # Retrieve the targets (actual labels) from the batch
        targets = batch[1]

        # Retrieve the example weight.
        example_weight = batch[2]

        # Make predictions using the inputs
        pred = model(inputs)

        # Calculate accuracy for the batch by comparing its predictions and targets
        batch_accuracy, batch_num_correct, batch_num_pred = compute_accuracy(
            pred, targets, example_weight)

        # Update the total number of correct predictions
        # by adding the number of correct predictions from this batch
        total_num_correct += batch_num_correct

        # Update the total number of predictions 
        # by adding the number of predictions made for the batch
        total_num_pred += batch_num_pred

    # Calculate accuracy over all examples
    accuracy = total_num_correct/total_num_pred

    ### END CODE HERE ###
    return accuracy
# DO NOT EDIT THIS CELL
# testing the accuracy of your model: this takes around 20 seconds
model = trainer.training_loop.eval_model

# we used all the data for the training and validation (oops)
# so we don't have any test data. Fix that later
#accuracy = test_model(VALIDATION_GENERATOR, model)
generator = valid_generator(infinite=False)
accuracy = test_model(generator, model)
print(f'The accuracy of your model on the validation set is {accuracy:.4f}', )
The accuracy of your model on the validation set is 0.9995

Testing Some Custom Input

Finally, let's test some custom input. You will see that deepnets are more powerful than the older methods we have used before. Although we got close to 100% accuracy using Naive Bayes and Logistic Regression, that was because the task was way easier.

This is used to predict on a new sentence.

def predict(sentence: str) -> tuple:
    """Predicts the sentiment of the sentence

    Args:
     sentence to get the sentiment for

    Returns:
     predictions, sentiment
    """
    inputs = numpy.array(converter.to_tensor(sentence))

    # Batch size 1, add dimension for batch, to work with the model
    inputs = inputs.reshape(1, len(inputs))

    # predict with the model
    probabilities = model(inputs)

    # Turn probabilities into categories
    prediction = int(probabilities[0, 1] > probabilities[0, 0])

    sentiment = "positive" if prediction == 1 else "negative"

    return prediction, sentiment
sentence = "It's such a nice day, think i'll be taking Sid to Ramsgate fish and chips for lunch at Peter's fish factory and then the beach maybe"
inputs = numpy.array(converter.to_tensor(sentence))

A Positive Sentence

sentence = "It's such a nice day, think i'll be taking Sid to Ramsgate fish and chips for lunch at Peter's fish factory and then the beach maybe"
tmp_pred, tmp_sentiment = predict(sentence)
print(f"The sentiment of the sentence \n***\n\"{sentence}\"\n***\nis {tmp_sentiment}.")
The sentiment of the sentence 
***
"It's such a nice day, think i'll be taking Sid to Ramsgate fish and chips for lunch at Peter's fish factory and then the beach maybe"
***
is positive.

A Negative Sentence

sentence = "I hated my day, it was the worst, I'm so sad."
tmp_pred, tmp_sentiment = predict(sentence)
print(f"The sentiment of the sentence \n***\n\"{sentence}\"\n***\nis {tmp_sentiment}.")
The sentiment of the sentence 
***
"I hated my day, it was the worst, I'm so sad."
***
is negative.

Notice that the model works well even for complex sentences.

On Pooh

s = "Oh, bother!"
print(f"{s}: {predict(s)}")
Oh, bother!: (0, 'negative')

On Deep Nets

Deep nets allow you to understand and capture dependencies that you would have not been able to capture with a simple linear regression, or logistic regression.

  • It also allows you to better use pre-trained embeddings for classification and tends to generalize better.

End

So, there you have it, a Deep Learning Model for Sentiment Analysis built using Trax. Here are the prior posts in this series.

Sentiment Analysis: Training the Model

Training the Model

In the previous post we defined our Deep Learning model for Sentiment Analysis. Now we'll turn to training it on our data.

To train a model on a task, Trax defines an abstraction trax.supervised.training.TrainTask which packages the training data, loss and optimizer (among other things) together into an object.

Similarly to training a model, Trax defines an abstraction trax.supervised.training.EvalTask which packages the eval data and metrics (among other things) into another object.

The final piece tying things together is the trax.supervised.training.Loop abstraction that is a very simpl eand flexible way to put everything together and train the model, all the while evaluating it and saving checkpoints. Using Loop will save you a lot of code compared to always writing the training loop by hand, like you did in courses 1 and 2. More importantly, you are less likely to have a bug in that code that would ruin your training.

Imports

# from python
from functools import partial
from pathlib import Path

import random

# from pypi
from trax.supervised import training

import nltk
import trax
import trax.layers as trax_layers
import trax.fastmath.numpy as numpy

# this project
from neurotic.nlp.twitter.tensor_generator import TensorBuilder, TensorGenerator

This next part (re-downloading the dataset) is just because I have to keep setting up new containers to get trax to work…

nltk.download("twitter_samples", download_dir="/home/neurotic/data/datasets/nltk_data/")

Middle

The Dataset

BATCH_SIZE = 16

converter = TensorBuilder()


train_generator = partial(TensorGenerator, converter,
                                     positive_data=converter.positive_training,
                                     negative_data=converter.negative_training,
                                     batch_size=BATCH_SIZE)
training_generator = train_generator()

valid_generator = partial(TensorGenerator,
                          converter,
                          positive_data=converter.positive_validation,
                          negative_data=converter.negative_validation,
                          batch_size=BATCH_SIZE)
validation_generator = valid_generator()

size_of_vocabulary = len(converter.vocabulary)

Here's the Model

This was defined in the clast post. It seems like too much trouble not to just copy it over.

def classifier(vocab_size: int=size_of_vocabulary,
               embedding_dim: int=256,
               output_dim: int=2) -> trax_layers.Serial:
    """Creates the classifier model

    Args:
     vocab_size: number of tokens in the training vocabulary
     embedding_dim: output dimension for the Embedding layer
     output_dim: dimension for the Dense layer

    Returns:
     the composed layer-model
    """
    embed_layer = trax_layers.Embedding(
        vocab_size=vocab_size, # Size of the vocabulary
        d_feature=embedding_dim)  # Embedding dimension

    mean_layer = trax_layers.Mean(axis=1)

    dense_output_layer = trax_layers.Dense(n_units = output_dim)

    log_softmax_layer = trax_layers.LogSoftmax()

    model = trax_layers.Serial(
      embed_layer,
      mean_layer,
      dense_output_layer,
      log_softmax_layer
    )
    return model

Now to train the model.

First define the TrainTask, EvalTask and Loop in preparation to training the model.

random.seed(271)

# train_generator(batch_size=batch_size, shuffle=True),

train_task = training.TrainTask(
    labeled_data=training_generator,
    loss_layer=trax_layers.CrossEntropyLoss(),
    optimizer=trax.optimizers.Adam(0.01),
    n_steps_per_checkpoint=10,
)

eval_task = training.EvalTask(
    labeled_data=validation_generator,
    metrics=[trax_layers.CrossEntropyLoss(), trax_layers.Accuracy()],
)

model = classifier()

This defines a model trained using tl.CrossEntropyLoss optimized with the trax.optimizers.Adam optimizer, all the while tracking the accuracy using tl.Accuracy metric. We also track tl.CrossEntropyLoss on the validation set.

Now let's make an output directory and train the model.

output_path = Path("~/models/").expanduser()
if not output_path.is_dir():
    output_path.mkdir()
def train_model(classifier, train_task, eval_task, n_steps, output_dir):
    """Create and run the training loop

    Args: 
       classifier - the model you are building
       train_task - Training task
       eval_task - Evaluation task
       n_steps - the evaluation steps
       output_dir - folder to save your files
    Returns:
       trainer -  trax trainer
    """
    training_loop = training.Loop(
                                model=classifier, # The learning model
                                tasks=train_task, # The training task
                                eval_tasks = eval_task, # The evaluation task
                                output_dir = output_dir) # The output directory

    training_loop.run(n_steps = n_steps)
    # Return the training_loop, since it has the model.
    return training_loop
training_loop = train_model(model, train_task, eval_task, 100, output_path)

Step    110: Ran 10 train steps in 6.06 secs
Step    110: train CrossEntropyLoss |  0.00527583
Step    110: eval  CrossEntropyLoss |  0.00304692
Step    110: eval          Accuracy |  1.00000000

Step    120: Ran 10 train steps in 2.06 secs
Step    120: train CrossEntropyLoss |  0.02130376
Step    120: eval  CrossEntropyLoss |  0.00000677
Step    120: eval          Accuracy |  1.00000000

Step    130: Ran 10 train steps in 0.75 secs
Step    130: train CrossEntropyLoss |  0.01026674
Step    130: eval  CrossEntropyLoss |  0.00424393
Step    130: eval          Accuracy |  1.00000000

Step    140: Ran 10 train steps in 1.33 secs
Step    140: train CrossEntropyLoss |  0.00172522
Step    140: eval  CrossEntropyLoss |  0.00004072
Step    140: eval          Accuracy |  1.00000000

Step    150: Ran 10 train steps in 0.77 secs
Step    150: train CrossEntropyLoss |  0.00002847
Step    150: eval  CrossEntropyLoss |  0.00000232
Step    150: eval          Accuracy |  1.00000000

Step    160: Ran 10 train steps in 0.78 secs
Step    160: train CrossEntropyLoss |  0.00002123
Step    160: eval  CrossEntropyLoss |  0.00104654
Step    160: eval          Accuracy |  1.00000000

Step    170: Ran 10 train steps in 0.79 secs
Step    170: train CrossEntropyLoss |  0.00001706
Step    170: eval  CrossEntropyLoss |  0.00000080
Step    170: eval          Accuracy |  1.00000000

Step    180: Ran 10 train steps in 0.83 secs
Step    180: train CrossEntropyLoss |  0.00001554
Step    180: eval  CrossEntropyLoss |  0.00000989
Step    180: eval          Accuracy |  1.00000000

Step    190: Ran 10 train steps in 0.85 secs
Step    190: train CrossEntropyLoss |  0.00639312
Step    190: eval  CrossEntropyLoss |  0.00255337
Step    190: eval          Accuracy |  1.00000000

Step    200: Ran 10 train steps in 0.85 secs
Step    200: train CrossEntropyLoss |  0.00124322
Step    200: eval  CrossEntropyLoss |  0.02190475
Step    200: eval          Accuracy |  1.00000000

Bundle It Up

<<imports>>


<<model-trainer>>

    <<the-model>>

    <<training-task>>

    <<eval-task>>

    <<training-loop>>

    <<fit-the-model>>

Imports

# python
from pathlib import Path

# from pypi
from trax.supervised import training

import attr
import trax
import trax.layers as trax_layers

The Trainer

@attr.s(auto_attribs=True)
class SentimentNetwork:
    """Builds and Trains the Sentiment Analysis Model

    Args:
     training_generator: generator of training batches
     validation_generator: generator of validation batches
     vocabulary_size: number of tokens in the training vocabulary
     training_loops: number of times to run the training loop
     output_path: path to where to store the model
     embedding_dimension: output dimension for the Embedding layer
     output_dimension: dimension for the Dense layer
    """
    vocabulary_size: int
    training_generator: object
    validation_generator: object
    training_loops: int
    output_path: Path
    embedding_dimension: int=256
    output_dimension: int=2
    _model: trax_layers.Serial=None
    _training_task: training.TrainTask=None
    _evaluation_task: training.EvalTask=None
    _training_loop: training.Loop=None
  • The Model
    @property
    def model(self) -> trax_layers.Serial:
        """The Embeddings model"""
        if self._model is None:
            self._model = trax_layers.Serial(
                trax_layers.Embedding(
                    vocab_size=self.vocabulary_size,
                    d_feature=self.embedding_dimension),
                trax_layers.Mean(axis=1),
                trax_layers.Dense(n_units=self.output_dimension),
                trax_layers.LogSoftmax(),
            )
        return self._model
    
  • The Training Task
    @property
    def training_task(self) -> training.TrainTask:
        """The training task for training the model"""
        if self._training_task is None:
            self._training_task = training.TrainTask(
                labeled_data=self.training_generator,
                loss_layer=trax_layers.CrossEntropyLoss(),
                optimizer=trax.optimizers.Adam(0.01),
                n_steps_per_checkpoint=10,
            )
        return self._training_task
    
  • Evaluation Task
    @property
    def evaluation_task(self) -> training.EvalTask:
        """The validation evaluation task"""
        if self._evaluation_task is None:
            self._evaluation_task = training.EvalTask(
                labeled_data=self.validation_generator,
                metrics=[trax_layers.CrossEntropyLoss(),
                         trax_layers.Accuracy()],
            )
        return self._evaluation_task
    
  • Training Loop
    @property
    def training_loop(self) -> training.Loop:
        """The thing to run the training"""
        if self._training_loop is None:
            self._training_loop = training.Loop(
                model=self.model,
                tasks=self.training_task,
                eval_tasks=self.evaluation_task,
                output_dir= self.output_path) 
        return self._training_loop
    
  • Fitting the Model
    def fit(self):
        """Runs the training loop"""
        self.training_loop.run(n_steps=self.training_loops)
        return
    

Practice In Making Predictions

Now that you have trained a model, you can access it as training_loop.model object. We will actually use training_loop.eval_model and in the next weeks you will learn why we sometimes use a different model for evaluation, e.g., one without dropout. For now, make predictions with your model.

Use the training data just to see how the prediction process works.

  • Later, you will use validation data to evaluate your model's performance.

Create a generator object.

tmp_train_generator = train_generator(batch_size=16)

Get one batch.

tmp_batch = next(tmp_train_generator)

Position 0 has the model inputs (tweets as tensors). Position 1 has the targets (the actual labels).

tmp_inputs, tmp_targets, tmp_example_weights = tmp_batch

print(f"The batch is a tuple of length {len(tmp_batch)} because position 0 contains the tweets, and position 1 contains the targets.") 
print(f"The shape of the tweet tensors is {tmp_inputs.shape} (num of examples, length of tweet tensors)")
print(f"The shape of the labels is {tmp_targets.shape}, which is the batch size.")
print(f"The shape of the example_weights is {tmp_example_weights.shape}, which is the same as inputs/targets size.")
The batch is a tuple of length 3 because position 0 contains the tweets, and position 1 contains the targets.
The shape of the tweet tensors is (16, 14) (num of examples, length of tweet tensors)
The shape of the labels is (16,), which is the batch size.
The shape of the example_weights is (16,), which is the same as inputs/targets size.

Feed the tweet tensors into the model to get a prediction.

tmp_pred = training_loop.eval_model(tmp_inputs)
print(f"The prediction shape is {tmp_pred.shape}, num of tensor_tweets as rows")
print("Column 0 is the probability of a negative sentiment (class 0)")
print("Column 1 is the probability of a positive sentiment (class 1)")
print()
print("View the prediction array")
print(tmp_pred)
The prediction shape is (16, 2), num of tensor_tweets as rows
Column 0 is the probability of a negative sentiment (class 0)
Column 1 is the probability of a positive sentiment (class 1)

View the prediction array
[[-1.2960873e+01 -2.3841858e-06]
 [-5.6474457e+00 -3.5326481e-03]
 [-5.3460855e+00 -4.7781467e-03]
 [-7.6736917e+00 -4.6515465e-04]
 [-5.2682662e+00 -5.1658154e-03]
 [-1.0566207e+01 -2.5749207e-05]
 [-5.6388092e+00 -3.5634041e-03]
 [-3.9540453e+00 -1.9363165e-02]
 [ 0.0000000e+00 -2.0700916e+01]
 [ 0.0000000e+00 -2.2949795e+01]
 [ 0.0000000e+00 -2.3168846e+01]
 [ 0.0000000e+00 -2.4553205e+01]
 [-9.5367432e-07 -1.3878939e+01]
 [ 0.0000000e+00 -1.6655178e+01]
 [ 0.0000000e+00 -1.5975946e+01]
 [ 0.0000000e+00 -2.0577690e+01]]

To turn these probabilities into categories (negative or positive sentiment prediction), for each row:

  • Compare the probabilities in each column.
  • If column 1 has a value greater than column 0, classify that as a positive tweet.
  • Otherwise if column 1 is less than or equal to column 0, classify that example as a negative tweet.

Turn probabilites into category predictions.

tmp_is_positive = tmp_pred[:,1] > tmp_pred[:,0]
for i, p in enumerate(tmp_is_positive):
    print(f"Neg log prob {tmp_pred[i,0]:.4f}\tPos log prob {tmp_pred[i,1]:.4f}\t is positive? {p}\t actual {tmp_targets[i]}")
Neg log prob -12.9609   Pos log prob -0.0000     is positive? True       actual 1
Neg log prob -5.6474    Pos log prob -0.0035     is positive? True       actual 1
Neg log prob -5.3461    Pos log prob -0.0048     is positive? True       actual 1
Neg log prob -7.6737    Pos log prob -0.0005     is positive? True       actual 1
Neg log prob -5.2683    Pos log prob -0.0052     is positive? True       actual 1
Neg log prob -10.5662   Pos log prob -0.0000     is positive? True       actual 1
Neg log prob -5.6388    Pos log prob -0.0036     is positive? True       actual 1
Neg log prob -3.9540    Pos log prob -0.0194     is positive? True       actual 1
Neg log prob 0.0000     Pos log prob -20.7009    is positive? False      actual 0
Neg log prob 0.0000     Pos log prob -22.9498    is positive? False      actual 0
Neg log prob 0.0000     Pos log prob -23.1688    is positive? False      actual 0
Neg log prob 0.0000     Pos log prob -24.5532    is positive? False      actual 0
Neg log prob -0.0000    Pos log prob -13.8789    is positive? False      actual 0
Neg log prob 0.0000     Pos log prob -16.6552    is positive? False      actual 0
Neg log prob 0.0000     Pos log prob -15.9759    is positive? False      actual 0
Neg log prob 0.0000     Pos log prob -20.5777    is positive? False      actual 0

Notice that since you are making a prediction using a training batch, it's more likely that the model's predictions match the actual targets (labels).

  • Every prediction that the tweet is positive is also matching the actual target of 1 (positive sentiment).
  • Similarly, all predictions that the sentiment is not positive matches the actual target of 0 (negative sentiment)

One more useful thing to know is how to compare if the prediction is matching the actual target (label).

  • The result of calculation is_positive is a boolean.
  • The target is a type trax.fastmath.numpy.int32
  • If you expect to be doing division, you may prefer to work with decimal numbers with the data type type trax.fastmath.numpy.int32

View the array of booleans.

print("Array of booleans")
display(tmp_is_positive)
Array of booleans
DeviceArray([ True,  True,  True,  True,  True,  True,  True,  True,
             False, False, False, False, False, False, False, False],            dtype=bool)

Convert booleans to type int32.

  • True is converted to 1
  • False is converted to 0
tmp_is_positive_int = tmp_is_positive.astype(trax.fastmath.numpy.int32)

View the array of integers.

print("Array of integers")
display(tmp_is_positive_int)
Array of integers
DeviceArray([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Convert boolean to type float32.

tmp_is_positive_float = tmp_is_positive.astype(numpy.float32)

View the array of floats.

print("Array of floats")
display(tmp_is_positive_float)
Array of floats
DeviceArray([1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.,
             0.], dtype=float32)
print(tmp_pred.shape)
(16, 2)

Note that Python usually does type conversion for you when you compare a boolean to an integer.

  • True compared to 1 is True, otherwise any other integer is False.
  • False compared to 0 is True, otherwise any ohter integer is False.
print(f"True == 1: {True == 1}")
print(f"True == 2: {True == 2}")
print(f"False == 0: {False == 0}")
print(f"False == 2: {False == 2}")
True == 1: True
True == 2: False
False == 0: True
False == 2: False

However, we recommend that you keep track of the data type of your variables to avoid unexpected outcomes. So it helps to convert the booleans into integers.

Compare 1 to 1 rather than comparing True to 1.

Hopefully you are now familiar with what kinds of inputs and outputs the model uses when making a prediction.

  • This will help you implement a function that estimates the accuracy of the model's predictions.

Evaluation

5.1 Computing the accuracy of a batch

You will now write a function that evaluates your model on the validation set and returns the accuracy.

  • preds contains the predictions.
  • Its dimensions are (batch_size, output_dim). output_dim is two in this case. Column 0 contains the probability that the tweet belongs to class 0 (negative sentiment). Column 1 contains probability that it belongs to class 1 (positive sentiment).
  • If the probability in column 1 is greater than the probability in column 0, then interpret this as the model's prediction that the example has label 1 (positive sentiment).
  • Otherwise, if the probabilities are equal or the probability in column 0 is higher, the model's prediction is 0 (negative sentiment).
  • y contains the actual labels.
  • y_weights contains the weights to give to predictions.
def compute_accuracy(preds: numpy.ndarray,
                     y: numpy.ndarray,
                     y_weights: numpy.ndarray) -> tuple:
    """Compute a batch accuracy

    Args: 
       preds: a tensor of shape (dim_batch, output_dim) 
       y: a tensor of shape (dim_batch,) with the true labels
       y_weights: a n.ndarray with the a weight for each example

    Returns: 
       accuracy: a float between 0-1 
       weighted_num_correct (np.float32): Sum of the weighted correct predictions
       sum_weights (np.float32): Sum of the weights
    """
    # Create an array of booleans, 
    # True if the probability of positive sentiment is greater than
    # the probability of negative sentiment
    # else False
    is_pos =  preds[:, 1] > preds[:, 0]

    # convert the array of booleans into an array of np.int32
    is_pos_int = is_pos.astype(numpy.int32)

    # compare the array of predictions (as int32) with the target (labels) of type int32
    correct = is_pos_int == y

    # Count the sum of the weights.
    sum_weights = y_weights.sum()

    # convert the array of correct predictions (boolean) into an arrayof np.float32
    correct_float = correct.astype(numpy.float32)

    # Multiply each prediction with its corresponding weight.
    weighted_correct_float = correct_float.dot(y_weights)

    # Sum up the weighted correct predictions (of type np.float32), to go in the
    # denominator.
    weighted_num_correct = weighted_correct_float.sum()

    # Divide the number of weighted correct predictions by the sum of the
    # weights.
    accuracy = weighted_num_correct/sum_weights

    return accuracy, weighted_num_correct, sum_weights

Get one batch.

tmp_val_generator = valid_generator(batch_size=64)
tmp_batch = next(tmp_val_generator)

Position 0 has the model inputs (tweets as tensors) position 1 has the targets (the actual labels)

tmp_inputs, tmp_targets, tmp_example_weights = tmp_batch

Feed the tweet tensors into the model to get a prediction.

tmp_pred = training_loop.eval_model(tmp_inputs)
tmp_acc, tmp_num_correct, tmp_num_predictions = compute_accuracy(preds=tmp_pred, y=tmp_targets, y_weights=tmp_example_weights)

print(f"Model's prediction accuracy on a single training batch is: {100 * tmp_acc}%")
print(f"Weighted number of correct predictions {tmp_num_correct}; weighted number of total observations predicted {tmp_num_predictions}")
Model's prediction accuracy on a single training batch is: 100.0%
Weighted number of correct predictions 64.0; weighted number of total observations predicted 64

End

Now that we have a trained model, in the next post we'll test how well it did.

Sentiment Analysis: Defining the Model

Beginning

This continues a series on sentiment analysis with deep learning. In the previous post we loaded and processed our data set. In this post we'll see about actually defining the Neural Network.

In this part we will write your own library of layers. It will be very similar to the one used in Trax and also in Keras and PyTorch. The intention is that in writing our own small framework will help us understand how they all work and use them more effectively in the future.

Imports

# from pypi
from expects import be_true, expect
from trax import fastmath

import attr
import numpy
import trax
import trax.layers as trax_layers

# this project
from neurotic.nlp.twitter.tensor_generator import TensorBuilder

Set Up

Some aliases to get closer to what the notebook has.

numpy_fastmath = fastmath.numpy
random = fastmath.random

Middle

The Base Layer Class

This will be the base class that the others will inherit from.

@attr.s(auto_attribs=True)
class Layer:
    """Base class for layers
    """
    def forward(self, x: numpy.ndarray):
        """The forward propagation method

       Raises:
        NotImplementedError - method is called but child hasn't implemented it
       """
        raise NotImplementedError

    def init_weights_and_state(self, input_signature, random_key):
        """method to initialize the weights
       based on the input signature and random key,
       be implemented by subclasses of this Layer class
       """
        raise NotImplementedError

    def init(self, input_signature, random_key) -> numpy.ndarray:
        """initializes and returns the weights

       Note:
        This is just an alias for the ``init_weights_and_state``
       method for some reason

       Args: 
        input_signature: who knows?
        random_key: once again, who knows?

       Returns:
        the weights
       """
        self.init_weights_and_state(input_signature, random_key)
        return self.weights

    def __call__(self, x) -> numpy.ndarray:
        """This is an alias for the ``forward`` method

       Args:
        x: input array

       Returns:
        whatever the ``forward`` method does
       """
        return self.forward(x)

The ReLU class

Here's the ReLU function:

\[ \mathrm{ReLU}(x) = \mathrm{max}(0,x) \]

We'll implement the ReLU activation function below. The function will take in a matrix or vector and it transform all the negative numbers into 0 while keeping all the positive numbers intact.

Please use numpy.maximum(A,k) to find the maximum between each element in A and a scalar k.

class Relu(Layer):
    """Relu activation function implementation"""
    def forward(self, x: numpy.ndarray) -> numpy.ndarray:
        """"Performs the activation

       Args: 
           - x: the input

       Returns:
           - activation: all positive or 0 version of x
       """
        return numpy.maximum(x, 0)

Test It

x = numpy.array([[-2.0, -1.0, 0.0], [0.0, 1.0, 2.0]], dtype=float)
relu_layer = Relu()
print("Test data is:")
print(x)
print("\nOutput of Relu is:")
actual = relu_layer(x)

print(actual)

expected = numpy.array([[0., 0., 0.],
                        [0., 1., 2.]])

expect(numpy.allclose(actual, expected)).to(be_true)
Test data is:
[[-2. -1.  0.]
 [ 0.  1.  2.]]

Output of Relu is:
[[0. 0. 0.]
 [0. 1. 2.]]

The Dense class

Implement the forward function of the Dense class.

  • The forward function multiplies the input to the layer (x) by the weight matrix (W).

\[ \mathrm{forward}(\mathbf{x},\mathbf{W}) = \mathbf{xW} \]

  • You can use numpy.dot to perform the matrix multiplication.

Note that for more efficient code execution, you will use the trax version of math, which includes a trax version of numpy and also random.

Implement the weight initializer new_weights function

  • Weights are initialized with a random key.
  • The second parameter is a tuple for the desired shape of the weights (num_rows, num_cols)
  • The num of rows for weights should equal the number of columns in x, because for forward propagation, you will multiply x times weights.

Please use trax.fastmath.random.normal(key, shape, dtype=tf.float32) to generate random values for the weight matrix. The key difference between this function and the standard numpy randomness is the explicit use of random keys, which need to be passed in. While it can look tedious at the first sight to pass the random key everywhere, you will learn in Course 4 why this is very helpful when implementing some advanced models.

  • key can be generated by calling random.get_prng(seed) and passing in a number for the seed.
  • shape is a tuple with the desired shape of the weight matrix.
    • The number of rows in the weight matrix should equal the number of columns in the variable x. Since x may have 2 dimensions if it represents a single training example (row, col), or three dimensions (batch_size, row, col), get the last dimension from the tuple that holds the dimensions of x.
    • The number of columns in the weight matrix is the number of units chosen for that dense layer. Look at the __init__ function to see which variable stores the number of units.
  • dtype is the data type of the values in the generated matrix; keep the default of tf.float32. In this case, don't explicitly set the dtype (just let it use the default value).

Set the standard deviation of the random values to 0.1

  • The values generated have a mean of 0 and standard deviation of 1.
  • Set the default standard deviation stdev to be 0.1 by multiplying the standard deviation to each of the values in the weight matrix.

See how the fastmath.trax.random.normal function works.

tmp_key = random.get_prng(seed=1)
print("The random seed generated by random.get_prng")
display(tmp_key)
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
The random seed generated by random.get_prng
DeviceArray([0, 1], dtype=uint32)

For some reason tensorflow can't find the GPU. Setting the log level to 0 like the message suggests shows that it gives up after trying to find a TPU, there's no indication that it's looking for the GPU.

import tensorflow
print(tensorflow.test.gpu_device_name())

Hmmm. I'll have to troubleshoot that.

print("choose a matrix with 2 rows and 3 columns")
tmp_shape=(2,3)
print(tmp_shape)
choose a matrix with 2 rows and 3 columns
(2, 3)

Generate a weight matrix Note that you'll get an error if you try to set dtype to tf.float32, where tf is tensorflow Just avoid setting the dtype and allow it to use the default data type

tmp_weight = random.normal(key=tmp_key, shape=tmp_shape)

print("Weight matrix generated with a normal distribution with mean 0 and stdev of 1")
display(tmp_weight)
Weight matrix generated with a normal distribution with mean 0 and stdev of 1
DeviceArray([[ 0.957307  , -0.9699291 ,  1.0070664 ],
             [ 0.36619022,  0.17294823,  0.29092228]], dtype=float32)
@attr.s(auto_attribs=True)
class Dense(Layer):
    """
    A dense (fully-connected) layer.

    Args:
     - n_units: the number of columns for our weight matrix
     - init_stdev: standard deviation for our initial weights
    """
    n_units: int
    init_stdev: float=0.1

    def forward(self, x: numpy.ndarray) -> numpy.ndarray:
        """The dot product of the input and the weights

       Args:
        x: input to multipyl

       Returns:
        product of x and weights
       """
        return numpy.dot(x, self.weights)

    def init_weights_and_state(self, input_signature: tuple,
                               random_key: int) -> numpy.ndarray:
        """initializes the weights

       Args:
        input_signature: tuple whose final dimension will be the number of rows
        random_ke: something to start the random normal generator with
       """
        input_shape = input_signature.shape

        # to allow for more than two-dimensional matrices,
        # we use the last column of the input shape, rather than assuming it's
        # column 1
        self.weights = (random.normal(key=random_key,
                                      shape=(input_shape[-1], self.n_units))
             * self.init_stdev)
        return self.weights
dense_layer = Dense(n_units=10)  #sets  number of units in dense layer
random_key = random.get_prng(seed=0)  # sets random seed
z = numpy.array([[2.0, 7.0, 25.0]]) # input array 

dense_layer.init(z, random_key)
print("Weights are\n ",dense_layer.weights) #Returns randomly generated weights
output = dense_layer(z)
print("Foward function output is ", output) # Returns multiplied values of units and weights

expected_weights = numpy.array([
    [-0.02837108,  0.09368162, -0.10050076,  0.14165013,  0.10543301,  0.09108126,
     -0.04265672,  0.0986188,  -0.05575325,  0.00153249],
    [-0.20785688,  0.0554837,   0.09142365,  0.05744595,  0.07227863,  0.01210617,
     -0.03237354,  0.16234995,  0.02450038, -0.13809784],
    [-0.06111237,  0.01403724,  0.08410042, -0.1094358,  -0.10775021, -0.11396459,
     -0.05933381, -0.01557652, -0.03832145, -0.11144515]])

expected_output = numpy.array(
    [[-3.0395496,   0.9266802,   2.5414743,  -2.050473,   -1.9769388,  -2.582209,
      -1.7952735,   0.94427425, -0.8980402,  -3.7497487]])

expect(numpy.allclose(dense_layer.weights, expected_weights)).to(be_true)
expect(numpy.allclose(output, expected_output)).to(be_true)
Weights are
  [[-0.02837108  0.09368162 -0.10050076  0.14165013  0.10543301  0.09108126
  -0.04265672  0.0986188  -0.05575325  0.00153249]
 [-0.20785688  0.0554837   0.09142365  0.05744595  0.07227863  0.01210617
  -0.03237354  0.16234995  0.02450038 -0.13809784]
 [-0.06111237  0.01403724  0.08410042 -0.1094358  -0.10775021 -0.11396459
  -0.05933381 -0.01557652 -0.03832145 -0.11144515]]
Foward function output is  [[-3.03954965  0.92668021  2.54147445 -2.05047299 -1.97693891 -2.58220917
  -1.79527355  0.94427423 -0.89804017 -3.74974866]]

The Layers for the Trax-Based Model

For the model implementation we will use the Trax layers library. Trax layers are very similar to the ones we implemented above, but in addition to trainable weights they also have a non-trainable state. This state is used in layers like batch normalization and for inference - we will learn more about it later on.

Dense

First, look at the code of the Trax Dense layer and compare to the implementation above.

Another other important layer that we will use a lot is the Serial layer which allows us to execute one layer after another in sequence.

  • You can pass in the layers as arguments to Serial, separated by commas.
  • For example: tl.Serial(tl.Embeddings(...), tl.Mean(...), tl.Dense(...), tl.LogSoftmax(...))

The layer classes have pretty good docstrings, unlike the fastmath stuff, so it might be useful to look at it - but it's too long to include here.

We're also going to use an Embedding

  • tl.Embedding(vocab_size, d_feature).
  • vocab_size is the number of unique words in the given vocabulary.
  • d_feature is the number of elements in the word embedding (some choices for a word embedding size range from 150 to 300, for example).
tmp_embed = trax_layers.Embedding(vocab_size=3, d_feature=2)
display(tmp_embed)
Embedding_3_2

Another useful layer is the Mean which calculates means across an axis. In this case, use axis = 1 (across rows) to get an average embedding vector (an embedding vector that is an average of all words in the vocabulary).

  • For example, if the embedding matrix is 300 elements and vocab size is 10,000 words, taking the mean of the embedding matrix along axis=1 will yield a vector of 300 elements.

Pretend the embedding matrix uses 2 elements for embedding the meaning of a word and has a vocabulary size of 3, so it has shape (2,3).

tmp_embed = numpy.array([[1,2,3,],
                         [4,5,6]
                         ])

First take the mean along axis 0, which creates a vector whose length equals the vocabulary size (the number of columns).

display(numpy.mean(tmp_embed,axis=0))
array([2.5, 3.5, 4.5])

If you take the mean along axis 1 it creates a vector whose length equals the number of elements in a word embedding (the rows).

display(numpy.mean(tmp_embed,axis=1))
array([2., 5.])

Finally, a LogSoftmax layer gives you a log-softmax output.

Online Documentation

For completeness, here's some links to the Read the Docs documentation for these layers.

The Classifier Function

builder = TensorBuilder()
size_of_vocabulary = len(builder.vocabulary)
def classifier(vocab_size: int=size_of_vocabulary,
               embedding_dim: int=256,
               output_dim: int=2) -> trax_layers.Serial:
    """Creates the classifier model

    Args:
     vocab_size: number of tokens in the training vocabulary
     embedding_dim: output dimension for the Embedding layer
     output_dim: dimension for the Dense layer

    Returns:
     the composed layer-model
    """
    embed_layer = trax_layers.Embedding(
        vocab_size=vocab_size, # Size of the vocabulary
        d_feature=embedding_dim)  # Embedding dimension

    mean_layer = trax_layers.Mean(axis=1)

    dense_output_layer = trax_layers.Dense(n_units = output_dim)

    log_softmax_layer = trax_layers.LogSoftmax()

    model = trax_layers.Serial(
      embed_layer,
      mean_layer,
      dense_output_layer,
      log_softmax_layer
    )
    return model
tmp_model = classifier()
print(type(tmp_model))
display(tmp_model)
<class 'trax.layers.combinators.Serial'>
Serial[
  Embedding_9164_256
  Mean
  Dense_2
  LogSoftmax
]

Ending

Now that we have our Deep Learning model, we'll move on to training it.

Sentiment Analysis: Pre-processing the Data

Beginning

This is the next in a series about building a Deep Learning model for sentiment analysis. The first post was this one.

Imports

# from python
from argparse import Namespace

import random

# from pypi
from expects import contain_exactly, equal, expect
from nltk.corpus import twitter_samples

import nltk
import numpy

# this project
from neurotic.nlp.twitter.processor import TwitterProcessor

Set Up

The NLTK data has to be downloaded at least once.

nltk.download("twitter_samples", download_dir="~/data/datasets/nltk_data/")

Middle

The NLTK Data

positive = twitter_samples.strings('positive_tweets.json')
negative = twitter_samples.strings('negative_tweets.json')

print(f"Positive Tweets: {len(positive):,}")
print(f"Negative Tweets: {len(negative):,}")
Positive Tweets: 5,000
Negative Tweets: 5,000

Split It Up

Instead of randomly splitting the data we're going to do a straight slice.

SPLIT = 4000

Split positive set into validation and training

positive_validation   = positive[SPLIT:]
positive_training  = positive[:SPLIT]

Split negative set into validation and training

negative_validation = negative[SPLIT:]
negative_training  = negative[:SPLIT]

Combine the Data Sets

The X data.

train_x = positive_training + negative_training
validation_x = positive_validation + negative_validation

The labels (1 for positive, 0 for negative).

train_y = numpy.append(numpy.ones(len(positive_training)),
                       numpy.zeros(len(negative_training)))
validation_y  = numpy.append(numpy.ones(len(positive_validation)),
                             numpy.zeros(len(negative_validation)))

print(f"length of train_x {len(train_x):,}")
print(f"length of validation_x {len(validation_x):,}")
length of train_x 8,000
length of validation_x 2,000

Building the vocabulary

Now build the vocabulary.

  • Map each word in each tweet to an integer (an "index").
  • The following code does this for you, but please read it and understand what it's doing.
  • Note that you will build the vocabulary based on the training data.
  • To do so, you will assign an index to everyword by iterating over your training set.

The vocabulary will also include some special tokens

  • __PAD__: padding
  • </e>: end of line
  • __UNK__: a token representing any word that is not in the vocabulary.
Tokens = Namespace(padding="__PAD__", ending="__</e>__", unknown="__UNK__")
process = TwitterProcessor()
vocabulary = {Tokens.padding: 0, Tokens.ending: 1, Tokens.unknown: 2}
for tweet in train_x:
    for token in process(tweet):
        if token not in vocabulary:
            vocabulary[token] = len(vocabulary)
print(f"Words in the vocabulary: {len(vocabulary):,}")

count = 0
for token in vocabulary:
    print(f"{count}: {token}: {vocabulary[token]}")
    count += 1
    if count == 5:
        break
Words in the vocabulary: 9,164
0: __PAD__: 0
1: __</e>__: 1
2: __UNK__: 2
3: followfriday: 3
4: top: 4

Converting a tweet to a tensor

Now we'll write a function that will convert each tweet to a tensor (a list of unique integer IDs representing the processed tweet).

  • Note, the returned data type will be a regular Python `list()`
    • You won't use TensorFlow in this function
    • You also won't use a numpy array
    • You also won't use trax.fastmath.numpy array
  • For words in the tweet that are not in the vocabulary, set them to the unique ID for the token `__UNK__`.

    For example, given this string:

'@happypuppy, is Maria happy?'

You first tokenize it.

['maria', 'happi']

Then convert each word into the index for it.

[2, 56]

Notice that the word "maria" is not in the vocabulary, so it is assigned the unique integer associated with the __UNK__ token, because it is considered "unknown."

# UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: tweet_to_tensor
def tweet_to_tensor(tweet: str, vocab_dict: dict,
                    unk_token: str='__UNK__', verbose: bool=False):
    """Convert a tweet to a list of indices

    Args: 
       tweet - A string containing a tweet
       vocab_dict - The words dictionary
       unk_token - The special string for unknown tokens
       verbose - Print info during runtime

    Returns:
       tensor_l - A python list with indices for the tweet tokens
    """

    ### START CODE HERE (Replace instances of 'None' with your code) ###
    # Process the tweet into a list of words
    # where only important words are kept (stop words removed)
    word_l = processor(tweet)

    if verbose:
        print("List of words from the processed tweet:")
        print(word_l)

    # Initialize the list that will contain the unique integer IDs of each word
    tensor_l = []

    # Get the unique integer ID of the __UNK__ token
    unk_ID = vocab_dict[unk_token]

    if verbose:
        print(f"The unique integer ID for the unk_token is {unk_ID}")

    # for each word in the list:
    for word in word_l:

        # Get the unique integer ID.
        # If the word doesn't exist in the vocab dictionary,
        # use the unique ID for __UNK__ instead.
        word_ID = vocab_dict.get(word, unk_ID)
    ### END CODE HERE ###

        # Append the unique integer ID to the tensor list.
        tensor_l.append(word_ID) 

    return tensor_l
print("Actual tweet is\n", positive_validation[0])
print("\nTensor of tweet:\n", tweet_to_tensor(positive_validation[0], vocab_dict=vocabulary))
Actual tweet is
 Bro:U wan cut hair anot,ur hair long Liao bo
Me:since ord liao,take it easy lor treat as save $ leave it longer :)
Bro:LOL Sibei xialan

Tensor of tweet:
 [1072, 96, 484, 2376, 750, 8220, 1132, 750, 53, 2, 2701, 796, 2, 2, 354, 606, 2, 3523, 1025, 602, 4599, 9, 1072, 158, 2, 2]
def test_tweet_to_tensor():
    test_cases = [

        {
            "name":"simple_test_check",
            "input": [positive_validation[1], vocabulary],
            "expected":[444, 2, 304, 567, 56, 9],
            "error":"The function gives bad output for val_pos[1]. Test failed"
        },
        {
            "name":"datatype_check",
            "input":[positive_validation[1], vocabulary],
            "expected":type([]),
            "error":"Datatype mismatch. Need only list not np.array"
        },
        {
            "name":"without_unk_check",
            "input":[positive_validation[1], vocabulary],
            "expected":6,
            "error":"Unk word check not done- Please check if you included mapping for unknown word"
        }
    ]
    count = 0
    for test_case in test_cases:        
        try:
            if test_case['name'] == "simple_test_check":
                assert test_case["expected"] == tweet_to_tensor(*test_case['input'])
                count += 1
            if test_case['name'] == "datatype_check":
                assert isinstance(tweet_to_tensor(*test_case['input']), test_case["expected"])
                count += 1
            if test_case['name'] == "without_unk_check":
                assert None not in tweet_to_tensor(*test_case['input'])
                count += 1

        except:
            print(test_case['error'])
    if count == 3:
        print("\033[92m All tests passed")
    else:
        print(count," Tests passed out of 3")
test_tweet_to_tensor()            
The function gives bad output for val_pos[1]. Test failed
2  Tests passed out of 3

Their tweet processor wipes out everything after the start of a URL, even if it isn't part of the URL, so they have fewer tokens, so the indices won't match exactly.

Creating a batch generator

Most of the time in Natural Language Processing, and AI in general we use batches when training our data sets.

  • If instead of training with batches of examples, you were to train a model with one example at a time, it would take a very long time to train the model.
  • You will now build a data generator that takes in the positive/negative tweets and returns a batch of training examples. It returns the model inputs, the targets (positive or negative labels) and the weight for each target (ex: this allows us to treat some examples as more important to get right than others, but commonly this will all be 1.0).

Once you create the generator, you could include it in a for loop:

for batch_inputs, batch_targets, batch_example_weights in data_generator:

You can also get a single batch like this:

batch_inputs, batch_targets, batch_example_weights = next(data_generator)

The generator returns the next batch each time it's called.

  • This generator returns the data in a format (tensors) that you could directly use in your model.
  • It returns a triple: the inputs, targets, and loss weights:

– Inputs is a tensor that contains the batch of tweets we put into the model. – Targets is the corresponding batch of labels that we train to generate. – Loss weights here are just 1s with same shape as targets. Next week, you will use it to mask input padding.

data_generator

A batch of spaghetti.

# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED: Data generator
def data_generator(data_pos: list, data_neg: list, batch_size: int,
                   loop: bool, vocab_dict: dict, shuffle: bool=False):
    """Generates batches of data

    Args: 
       data_pos - Set of positive examples
       data_neg - Set of negative examples
       batch_size - number of samples per batch. Must be even
       loop - True or False
       vocab_dict - The words dictionary
       shuffle - Shuffle the data order

    Yield:
       inputs - Subset of positive and negative examples
       targets - The corresponding labels for the subset
       example_weights - An array specifying the importance of each example        
    """
### START GIVEN CODE ###
    # make sure the batch size is an even number
    # to allow an equal number of positive and negative samples
    assert batch_size % 2 == 0

    # Number of positive examples in each batch is half of the batch size
    # same with number of negative examples in each batch
    n_to_take = batch_size // 2

    # Use pos_index to walk through the data_pos array
    # same with neg_index and data_neg
    pos_index = 0
    neg_index = 0

    len_data_pos = len(data_pos)
    len_data_neg = len(data_neg)

    # Get and array with the data indexes
    pos_index_lines = list(range(len_data_pos))
    neg_index_lines = list(range(len_data_neg))

    # shuffle lines if shuffle is set to True
    if shuffle:
        rnd.shuffle(pos_index_lines)
        rnd.shuffle(neg_index_lines)

    stop = False

    # Loop indefinitely
    while not stop:  

        # create a batch with positive and negative examples
        batch = []

        # First part: Pack n_to_take positive examples

        # Start from pos_index and increment i up to n_to_take
        for i in range(n_to_take):

            # If the positive index goes past the positive dataset length,
            if pos_index >= len_data_pos: 

                # If loop is set to False, break once we reach the end of the dataset
                if not loop:
                    stop = True;
                    break;

                # If user wants to keep re-using the data, reset the index
                pos_index = 0

                if shuffle:
                    # Shuffle the index of the positive sample
                    rnd.shuffle(pos_index_lines)

            # get the tweet as pos_index
            tweet = data_pos[pos_index_lines[pos_index]]

            # convert the tweet into tensors of integers representing the processed words
            tensor = tweet_to_tensor(tweet, vocab_dict)

            # append the tensor to the batch list
            batch.append(tensor)

            # Increment pos_index by one
            pos_index = pos_index + 1

### END GIVEN CODE ###

### START CODE HERE (Replace instances of 'None' with your code) ###

        # Second part: Pack n_to_take negative examples

        # Using the same batch list, start from neg_index and increment i up to n_to_take
        for i in range(neg_index, n_to_take):

            # If the negative index goes past the negative dataset length,
            if neg_index > len_data_neg:

                # If loop is set to False, break once we reach the end of the dataset
                if not loop:
                    stop = True;
                    break;

                # If user wants to keep re-using the data, reset the index
                neg_index = 0

                if shuffle:
                    # Shuffle the index of the negative sample
                    rnd.shuffle(neg_index_lines)
            # get the tweet at neg_index
            tweet = data_neg[neg_index_lines[neg_index]]

            # convert the tweet into tensors of integers representing the processed words
            tensor = tweet_to_tensor(tweet, vocab_dict)

            # append the tensor to the batch list
            batch.append(tensor)

            # Increment neg_index by one
            neg_index += 1

### END CODE HERE ###        

### START GIVEN CODE ###
        if stop:
            break;

        # Update the start index for positive data 
        # so that it's n_to_take positions after the current pos_index
        pos_index += n_to_take

        # Update the start index for negative data 
        # so that it's n_to_take positions after the current neg_index
        neg_index += n_to_take

        # Get the max tweet length (the length of the longest tweet) 
        # (you will pad all shorter tweets to have this length)
        max_len = max([len(t) for t in batch]) 


        # Initialize the input_l, which will 
        # store the padded versions of the tensors
        tensor_pad_l = []
        # Pad shorter tweets with zeros
        for tensor in batch:
### END GIVEN CODE ###

### START CODE HERE (Replace instances of 'None' with your code) ###
            # Get the number of positions to pad for this tensor so that it will be max_len long
            n_pad = max_len - len(tensor)

            # Generate a list of zeros, with length n_pad
            pad_l = [0] * n_pad

            # concatenate the tensor and the list of padded zeros
            tensor_pad = tensor + pad_l

            # append the padded tensor to the list of padded tensors
            tensor_pad_l.append(tensor_pad)

        # convert the list of padded tensors to a numpy array
        # and store this as the model inputs
        inputs = numpy.array(tensor_pad_l)

        # Generate the list of targets for the positive examples (a list of ones)
        # The length is the number of positive examples in the batch
        target_pos = [1] * len(batch[:n_to_take])

        # Generate the list of targets for the negative examples (a list of zeros)
        # The length is the number of negative examples in the batch
        target_neg = [0] * len(batch[n_to_take:])

        # Concatenate the positve and negative targets
        target_l = target_pos + target_neg

        # Convert the target list into a numpy array
        targets = numpy.array(target_l)

        # Example weights: Treat all examples equally importantly.It should return an np.array. Hint: Use np.ones_like()
        example_weights = numpy.ones_like(targets)


### END CODE HERE ###

### GIVEN CODE ###
        # note we use yield and not return
        yield inputs, targets, example_weights

Now you can use your data generator to create a data generator for the training data, and another data generator for the validation data.

We will create a third data generator that does not loop, for testing the final accuracy of the model.

# Set the random number generator for the shuffle procedure
rnd = random
rnd.seed(30) 

# Create the training data generator
def train_generator(batch_size, shuffle = False):
    return data_generator(positive_training, negative_training,
                          batch_size, True, vocabulary, shuffle)

# Create the validation data generator
def val_generator(batch_size, shuffle = False):
    return data_generator(positive_validation, negative_validation,
                          batch_size, True, vocabulary, shuffle)

# Create the validation data generator
def test_generator(batch_size, shuffle = False):
    return data_generator(positive_validation, negative_validation, batch_size,
                          False, vocabulary, shuffle)

# Get a batch from the train_generator and inspect.
inputs, targets, example_weights = next(train_generator(4, shuffle=True))
# this will print a list of 4 tensors padded with zeros
print(f'Inputs: {inputs}')
print(f'Targets: {targets}')
print(f'Example Weights: {example_weights}')
Inputs: [[2030 4492 3231    9    0    0    0    0    0    0    0]
 [5009  571 2025 1475 5233 3532  142 3532  132  464    9]
 [3798  111   96  587 2960 4007    0    0    0    0    0]
 [ 256 3798    0    0    0    0    0    0    0    0    0]]
Targets: [1 1 0 0]
Example Weights: [1 1 1 1]

Test the train_generator

Create a data generator for training data which produces batches of size 4 (for tensors and their respective targets).

tmp_data_gen = train_generator(batch_size = 4)

Call the data generator to get one batch and its targets.

tmp_inputs, tmp_targets, tmp_example_weights = next(tmp_data_gen)
print(f"The inputs shape is {tmp_inputs.shape}")
print(f"The targets shape is {tmp_targets.shape}")
print(f"The example weights shape is {tmp_example_weights.shape}")

for i,t in enumerate(tmp_inputs):
    print(f"input tensor: {t}; target {tmp_targets[i]}; example weights {tmp_example_weights[i]}")
The inputs shape is (4, 14)
The targets shape is (4,)
The example weights shape is (4,)
input tensor: [3 4 5 6 7 8 9 0 0 0 0 0 0 0]; target 1; example weights 1
input tensor: [10 11 12 13 14 15 16 17 18 19 20  9 21 22]; target 1; example weights 1
input tensor: [5807 2931 3798    0    0    0    0    0    0    0    0    0    0    0]; target 0; example weights 1
input tensor: [ 865  261 3689 5808  313 4499  571 1248 2795  333 1220 3798    0    0]; target 0; example weights 1

Bundle It Up

<<imports>>

<<defaults>>

<<nltk-settings>>

<<special-tokens>>

<<the-builder>>

    <<positive-tweets>>

    <<negative-tweets>>

    <<positive-training>>

    <<negative-training>>

    <<positive-validation>>

    <<negative-validation>>

    <<twitter-processor>>

    <<the-vocabulary>>

    <<x-train>>

    <<to-tensor>>


<<the-generator>>

    <<positive-indices>>

    <<negative-indices>>

    <<positives>>

    <<negatives>>

    <<positive-generator>>

    <<negative-generator>>

    <<the-iterator>>

    <<the-next>>

Imports

# python
from argparse import Namespace
from itertools import cycle

import random

# pypi
from nltk.corpus import twitter_samples

import attr
import numpy

# this project
from .processor import TwitterProcessor

Defaults

Defaults = Namespace(
    split = 4000,
)

NLTK Settings

NLTK = Namespace(
    corpus="twitter_samples",
    negative = "negative_tweets.json",
    positive="positive_tweets.json",
)

Special Tokens

SpecialTokens = Namespace(padding="__PAD__",
                          ending="__</e>__",
                          unknown="__UNK__")

SpecialIDs = Namespace(
    padding=0,
    ending=1,
    unknown=2,
)

The Builder

@attr.s(auto_attribs=True)
class TensorBuilder:
    """converts tweets to tensors

    Args: 
     - split: where to split the training and validation data
    """
    split = Defaults.split
    _positive: list=None
    _negative: list=None
    _positive_training: list=None
    _negative_training: list=None
    _positive_validation: list=None
    _negative_validation: list=None
    _process: TwitterProcessor=None
    _vocabulary: dict=None
    _x_train: list=None
  • Positive Tweets
    @property
    def positive(self) -> list:
        """The raw positive NLTK tweets"""
        if self._positive is None:
            self._positive = twitter_samples.strings(NLTK.positive)
        return self._positive
    
  • Negative Tweets
    @property
    def negative(self) -> list:
        """The raw negative NLTK tweets"""
        if self._negative is None:
            self._negative = twitter_samples.strings(NLTK.negative)
        return self._negative
    
  • Positive Training
    @property
    def positive_training(self) -> list:
        """The positive training data"""
        if self._positive_training is None:
            self._positive_training = self.positive[:self.split]
        return self._positive_training
    
  • Negative Training
    @property
    def negative_training(self) -> list:
        """The negative training data"""
        if self._negative_training is None:
            self._negative_training = self.negative[:self.split]
        return self._negative_training
    
  • Positive Validation
    @property
    def positive_validation(self) -> list:
        """The positive validation data"""
        if self._positive_validation is None:
            self._positive_validation = self.positive[self.split:]
        return self._positive_validation
    
  • Negative Validation
    @property
    def negative_validation(self) -> list:
        """The negative validation data"""
        if self._negative_validation is None:
            self._negative_validation = self.negative[self.split:]
        return self._negative_validation
    
  • Twitter Processor
    @property
    def process(self) -> TwitterProcessor:
        """processor for tweets"""
        if self._process is None:
            self._process = TwitterProcessor()
        return self._process
    
  • X Train
    @property
    def x_train(self) -> list:
        """The unprocessed training data"""
        if self._x_train is None:
            self._x_train = self.positive_training + self.negative_training
        return self._x_train
    
  • The Vocabulary
    @property
    def vocabulary(self) -> dict:
        """A map of token to numeric id"""
        if self._vocabulary is None:
            self._vocabulary = {SpecialTokens.padding: SpecialIDs.padding,
                                SpecialTokens.ending: SpecialIDs.ending,
                                SpecialTokens.unknown: SpecialIDs.unknown}
            for tweet in self.x_train:
                for token in self.process(tweet):
                    if token not in self._vocabulary:
                        self._vocabulary[token] = len(self._vocabulary)
        return self._vocabulary
    
  • To Tensor
    def to_tensor(self, tweet: str) -> list:
        """Converts tweet to list of numeric identifiers
    
        Args:
         tweet: the string to convert
    
        Returns:
         list of IDs for the tweet
        """
        tensor = [self.vocabulary.get(token, SpecialIDs.unknown)
                  for token in self.process(tweet)]
        return tensor
    

The Generator

@attr.s(auto_attribs=True)
class TensorGenerator:
    """Generates batches of vectorized-tweets

    Args:
     converter: TensorBuilder object
     positive_data: list of positive data
     negative_data: list of negative data
     batch_size: the size for each generated batch     
     shuffle: whether to shuffle the generated data
     infinite: whether to generate data forever
    """
    converter: TensorBuilder
    positive_data: list
    negative_data: list
    batch_size: int
    shuffle: bool=True
    infinite: bool = True
    _positive_indices: list=None
    _negative_indices: list=None
    _positives: iter=None
    _negatives: iter=None
  • Positive Indices
    @property
    def positive_indices(self) -> list:
        """The indices to use to grab the positive tweets"""
        if self._positive_indices is None:
            k = len(self.positive_data)
            if self.shuffle:
                self._positive_indices = random.sample(range(k), k=k)
            else:
                self._positive_indices = list(range(k))
        return self._positive_indices
    
  • Negative Indices
    @property
    def negative_indices(self) -> list:
        """Indices for the negative tweets"""
        if self._negative_indices is None:
            k = len(self.negative_data)
            if self.shuffle:
                self._negative_indices = random.sample(range(k), k=k)
            else:
                self._negative_indices = list(range(k))
        return self._negative_indices
    
  • Positives
    @property
    def positives(self):
        """The positive index generator"""
        if self._positives is None:
            self._positives = self.positive_generator()
        return self._positives
    
  • Negatives
    @property
    def negatives(self):
        """The negative index generator"""
        if self._negatives is None:
            self._negatives = self.negative_generator()
        return self._negatives
    
  • Positive Generator
    def positive_generator(self):
        """Generator of indices for positive tweets"""
        stop = len(self.positive_indices)
        index = 0
        while True:
            yield self.positive_indices[index]
            index += 1
            if index == stop:
                if not self.infinite:
                    break
                if self.shuffle:
                    self._positive_indices = None
                index = 0
        return
    
  • Negative Generator
    def negative_generator(self):
        """generator of indices for negative tweets"""
        stop = len(self.negative_indices)
        index = 0
        while True:
            yield self.negative_indices[index]
            index += 1
            if index == stop:
                if not self.infinite:
                    break
                if self.shuffle:
                    self._negative_indices = None
                index = 0
        return
    
  • The Iterator
    def __iter__(self):
        return self
    
  • The Next Method
    def __next__(self):
        assert self.batch_size % 2 == 0
        half_batch = self.batch_size // 2
    
        # get the indices
        positives = (next(self.positives) for index in range(half_batch))
        negatives = (next(self.negatives) for index in range(half_batch))
    
        # get the tweets
        positives = (self.positive_data[index] for index in positives)
        negatives = (self.negative_data[index] for index in negatives)
    
        # get the token ids
        try:    
            positives = [self.converter.to_tensor(tweet) for tweet in positives]
            negatives = [self.converter.to_tensor(tweet) for tweet in negatives]
        except RuntimeError:
            # the next(self.positives) in the first generator will raise a
            # RuntimeError if
            # we're not running this infinitely
            raise StopIteration
    
        batch = positives + negatives
    
        longest = max((len(tweet) for tweet in batch))
    
        paddings = (longest - len(tensor) for tensor in batch)
        paddings = ([0] * padding for padding in paddings)
    
        padded = [tensor + padding for tensor, padding in zip(batch, paddings)]
        inputs = numpy.array(padded)
    
        # the labels for the inputs
        targets = numpy.array([1] * half_batch + [0] * half_batch)
    
        assert len(targets) == len(batch)
    
        # default the weights to ones
        weights = numpy.ones_like(targets)    
        return inputs, targets, weights
    

Test It Out

from neurotic.nlp.twitter.tensor_generator import TensorBuilder, TensorGenerator

converter = TensorBuilder()
expect(len(converter.vocabulary)).to(equal(len(vocabulary)))
tweet = positive_validation[0]
expected = [1072, 96, 484, 2376, 750, 8220, 1132, 750, 53, 2, 2701, 796, 2, 2,
            354, 606, 2, 3523, 1025, 602, 4599, 9, 1072, 158, 2, 2]

actual = converter.to_tensor(tweet)
expect(actual).to(contain_exactly(*expected))
generator = TensorGenerator(converter, batch_size=4)
print(next(generator))
(array([[ 749, 1019,  313, 1020,   75],
       [1009,    9,    0,    0,    0],
       [3540, 6030, 6031, 3798,    0],
       [  50,   96, 3798,    0,    0]]), array([1, 1, 0, 0]), array([1, 1, 1, 1]))
for count, batch in enumerate(generator):
    print(batch[0])
    print()
    if count == 5:
        break
print(next(generator))
[[  22 1228  434  354  227 2371    9]
 [ 267  160   89    0    0    0    0]
 [ 315 1008 8480 3798 2108  371 3233]
 [8232 8233  791 3798    0    0    0]]

[[1173 1061  586    9  896  729 1264  345 1062 1063]
 [3387  558  991 2166 3388 3231  558  238  120    0]
 [ 198 5997 3798    0    0    0    0    0    0    0]
 [ 223  310 3798    0    0    0    0    0    0    0]]

[[4015 4015 4015 4016  231 2117   57  422    9 4017 4018 4019   86   86]
 [2554   57  102  358   75    0    0    0    0    0    0    0    0    0]
 [  50   38  881 3798    0    0    0    0    0    0    0    0    0    0]
 [6729 6730 6731  382 3798    0    0    0    0    0    0    0    0    0]]

[[3479   75    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0]
 [4636 4637  233 4299  111  237 2626    9    0    0    0    0    0    0
     0    0    0]
 [  73  381  463 4321  142   96 7390 7391   92   85 1394 7392 5895 7393
    45 3798 7394]
 [8863 2844  991  127 5818    0    0    0    0    0    0    0    0    0
     0    0    0]]

[[ 226  615   22   75    0    0]
 [2135  703  237  435 3124    9]
 [2379 6264 3798    0    0    0]
 [6504 1912 2380 3798    0    0]]

[[5623  120    0    0    0    0    0    0    0    0]
 [ 133   54  102   63 1300   56    9   50   92 3181]
 [2094  383   73  464 3798    0    0    0    0    0]
 [ 223  101 8754  383 2085 5818 8755    0    0    0]]

(array([[ 374,   44, 2981,  435,  132,  111, 1040, 1382,    9,    0,    0,
           0],
       [ 369,  398,  283,    9, 2671, 1411,  136,  184,  769, 1262, 2061,
        3460],
       [1094, 9024,  315,  381, 3798,    0,    0,    0,    0,    0,    0,
           0],
       [9036, 3798,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0]]), array([1, 1, 0, 0]), array([1, 1, 1, 1]))

Ladies and gentlemen, we have ourselves a generator.

End

Now that we have our data, the next step will be to define the model.

Sentiment Analysis: Deep Learning Model

Beginning

Previously we created sentiment analysis models using the Logistic Regression and Naive Bayes algorithms. However if we were to give those models an example like:

This movie was almost good.

The model would have predicted a positive sentiment for that review. That sentence, however, is expressing the negative sentiment that the movie was not good. To solve those kinds of misclassifications we will write a program that uses deep neural networks to identify sentiment in text.

This model will follow a similar structure to the Continuous Bag of Words Model (Introducing the CBOW Model) that we looked at previously - indeed most of the deep nets have a similar structure. The only thing that changes is the model architecture, the inputs, and the outputs. Although we looked at Trax and JAX in a previous post (Introducing Trax) we'll start off with a review of some of their features and then in future posts we'll implement the actual model. These are the other posts.

Imports

# from python
import os
import random

# from pypi
from trax import layers
import trax
import trax.fastmath.numpy as numpy

Set Up

The Random Seed

trax.supervised.trainer_lib.init_random_number_generators(31)

Middle

Trax Review

JAX Arrays

First, the JAX reimplementation of numpy (from Trax.fastmath).

an_array = numpy.array(5.0)
display(an_array)
print(type(an_array))
DeviceArray(5., dtype=float32)
<class 'jax.interpreters.xla._DeviceArray'>

Note: the trax library is strict about the typing so 5 won't work, it has to be a float.

Squaring

Now we'll create a function to square the array.

def square(x) :
    return x**2
print(f"f({an_array}) -> {square(an_array)}")
f(5.0) -> 25.0

Gradients

The gradient (derivative) of function f with respect to its input x is the derivative of \(x^2\).

  • The derivative of \(x^2\) is \(2x\).
  • When x is 5, then 2x=10.

You can calculate the gradient of a function by using trax.fastmath.grad(fun=) and passing in the name of the function.

  • In this case the function you want to take the gradient of is square.
  • The object returned (saved in square_gradient in this example) is a function that can calculate the gradient of square for a given trax.fastmath.numpy array.

Use trax.fastmath.grad to calculate the gradient (derivative) of the function.

square_gradient = trax.fastmath.grad(fun=square)

print(type(square_gradient))
<class 'function'>
gradient_calculation = square_gradient(an_array)
display(gradient_calculation)
DeviceArray(10., dtype=float32)

The function returned by trax.fastmath.grad takes in x=5 and calculates the gradient of square, which is 2x, which equals 10. The value is also stored as a DeviceArray from the jax library.

End

Now that we've had a brief review of Trax let's move on to loading the data.

Raw

# import Layer from the utils.py file
from utils import Layer, load_tweets, process_tweet
#from utils import 






Data Generators

Data generators

In Python, a generator is a function that behaves like an iterator. It will return the next item. In many AI applications, it is advantageous to have a data generator to handle loading and transforming data for different applications.

In the following example, we use a set of samples a, to derive a new set of samples, with more elements than the original set.

Note: Pay attention to the use of list lines_index and variable index to traverse the original list.

Imports

# python
from itertools import cycle

import random

# pypi
from expects import be_true, expect
import numpy

Examples

An Example of a Circular List

This is sort of a fake generator that uses indices to make it look like it's infinite.

a = [1, 2, 3, 4]
a_size = len(a)
end = 10
index = 0                      # similar to index in data_generator below
for i in range(10):        # `b` is longer than `a` forcing a wrap   
    print(a[index], end=",")
    index = (index + 1) % a_size    
1,2,3,4,1,2,3,4,1,2,

There's a python built-in that's equivalent to this called cycle.

index = 1
for item in cycle(a):
    print(item, end=",")
    if index == end:
        break
    index += 1    
1,2,3,4,1,2,3,4,1,2,

And if you wanted to make your own generator version you could use the yield keyword.

def infinite(a: list):
    """Generates elements infinitely

    Args:
     a: list

    Yields:
     elements of a
    """
    index = 0
    end = len(a)
    while True:
        yield a[index]
        index = (index + 1) % end
    return

a_infinite = infinite(a)
for index, item in enumerate(a_infinite):
    if index == end:
        break
    print(item, end=",")
1,2,3,4,1,2,3,4,1,2,

Shuffling the data order

In the next example, we will do the same as before, but shuffling the order of the elements in the output list. Note that here, our strategy of traversing using lines_index and index becomes very important, because we can simulate a shuffle in the input data, without doing that in reality.

a = tuple((1, 2, 3, 4))
a_size = len(a)
data_indices = list(range(a_size))
print(f"Original order of indices: {data_indices}")
Original order of indices: [0, 1, 2, 3]

If we shuffle the index_list we can change the order of our circular list without modifying the order or our original data.

random.shuffle(data_indices) # Shuffle the order
print(f"Shuffled order of indices: {data_indices}")
Shuffled order of indices: [3, 0, 1, 2]

Now we create a list of random values from a that is larger than a.

b = [a[index] for index in data_indices]
b_size = 10

print(f"New value order for first batch: {b}")
batch_counter = 1
data_index = 0
for b_index in range(len(b), b_size):
    if data_index == 0:
        batch_counter += 1
        random.shuffle(data_indices)
        print(f"\nShuffled Indexes for Batch No. {batch_counter} :{data_indices}")
        print(f"Values for Batch No.{batch_counter} :{[a[index] for index in data_indices]}")

    b.append(a[data_indices[data_index]])
    data_index = (data_index + 1) % a_size

print(f"\nFinal value of b: {b} with {len(b)} items")
New value order for first batch: [1, 3, 4, 2]

Shuffled Indexes for Batch No. 2 :[1, 3, 2, 0]
Values for Batch No.2 :[2, 4, 3, 1]

Shuffled Indexes for Batch No. 3 :[0, 3, 2, 1]
Values for Batch No.3 :[1, 4, 3, 2]

Final value of b: [1, 3, 4, 2, 2, 4, 3, 1, 1, 4] with 10 items

Note: We call an epoch each time that an algorithm passes over all the training examples. Shuffling the examples for each epoch is known to reduce variance, making the models more general and overfit less.

Using sample. instead.

data_indices = random.sample(range(a_size), k=a_size)
b = [a[index] for index in data_indices]
b_size = 10

print(f"New value order for first batch: {b}")
batch_counter = 1
data_index = 0
for b_index in range(len(b), b_size):
    if data_index == 0:
        batch_counter += 1
        data_indices = random.sample(data_indices, k=a_size)
        print(f"\nShuffled Indexes for Batch No. {batch_counter} :{data_indices}")
        print(f"Values for Batch No.{batch_counter} :{[a[index] for index in data_indices]}")

    b.append(a[data_indices[data_index]])
    data_index = (data_index + 1) % a_size

print(f"\nFinal value of b: {b} with {len(b)} items")
New value order for first batch: [1, 4, 3, 2]

Shuffled Indexes for Batch No. 2 :[3, 0, 1, 2]
Values for Batch No.2 :[4, 1, 2, 3]

Shuffled Indexes for Batch No. 3 :[2, 0, 1, 3]
Values for Batch No.3 :[3, 1, 2, 4]

Final value of b: [1, 4, 3, 2, 4, 1, 2, 3, 3, 1] with 10 items

Data Generator Function

This will be a data generator function that takes in batch_size, x, y shuffle where x could be a large list of samples, and y is a list of the tags associated with those samples. Return a subset of those inputs in a tuple of two arrays (X,Y). Each is an array of dimension (batch_size). If shuffle=True, the data will be traversed in a random form.

Which runs continuously in the fashion of generators, pausing when yielding the next values. We will generate a batch_size output on each pass of this loop.

It has an inner loop that stores the data samples in temporary lists (X, Y) which will be included in the next batch.

There are three slightly out-of-the-ordinary features to this function.

  1. The first is the use of a list of a predefined size to store the data for each batch. Using a predefined size list reduces the computation time if the elements in the array are of a fixed size, like numbers. If the elements are of different sizes, it is better to use an empty array and append one element at a time during the loop.
  2. The second is tracking the current location in the incoming lists of samples. Generators variables hold their values between invocations, so we create an index variable, initialize to zero, and increment by one for each sample included in a batch. However, we do not use the index to access the positions of the list of sentences directly. Instead, we use it to select one index from a list of indexes. In this way, we can change the order in which we traverse our original list, keeping untouched our original list.
  3. The third also relates to wrapping. Because batch_size and the length of the input lists are not aligned, gathering a batch_size group of inputs may involve wrapping back to the beginning of the input loop. In our approach, it is just enough to reset the index to 0. We can re-shuffle the list of indexes to produce different batches each time.
def data_generator(batch_size: int, data_x: list, data_y: list, shuffle: bool=True):
    """Infinite batch generator

      Args: 
       batch_size: the size to make batches
       data_x: list containing samples
       data_y: list containing labels
       shuffle: Shuffle the data order

      Yields:
       a tuple containing 2 elements:
       X - list of dim (batch_size) of samples
       Y - list of dim (batch_size) of labels
    """
    amount_of_data = len(data_x)
    assert amount_of_data == len(data_y)

    def re_shuffle(x):
        k = len(x)
        return random.sample(range(k), k=k)

    shuffler = re_shuffle if shuffle else lambda x: list(range(len(x)))
    source_indices = shuffler(data_x)

    source_location = 0
    while True:
        X = list(range(batch_size))
        Y = list(range(batch_size))

        for batch_location in range(batch_size):                            
            X[batch_location] = data_x[source_indices[source_location]]
            Y[batch_location] = data_y[source_indices[source_location]]
            source_location = (source_location + 1) % amount_of_data
            source_indices = (shuffler(data_x) if source_location == 0
                              else source_indices)            
        yield((X, Y))
    return
def test_data_generator() -> None:
    """Tests the un-shuffled version of the generator

    Raises:
     AssertionError: some value didn't match.
    """
    x = [1, 2, 3, 4]
    y = [xi ** 2 for xi in x]

    generator = data_generator(3, x, y, shuffle=False)
    for expected in (([1, 2, 3], [1, 4, 9]),
                     ([4, 1, 2], [16, 1, 4]),
                     ([3, 4, 1], [9, 16, 1]),
                     ([2, 3, 4], [4, 9, 16])):
        expect(numpy.allclose(next(generator), expected)).to(be_true)
    return
test_data_generator()

Classes and Subclasses

Classes and Subclasses

In this notebook, I will show you the basics of classes and subclasses in Python. As you've seen in the lectures from this week, `Trax` uses layer classes as building blocks for deep learning models, so it is important to understand how classes and subclasses behave in order to be able to build custom layers when needed.

By completing this notebook, you will:

  • Be able to define classes and subclasses in Python
  • Understand how inheritance works in subclasses
  • Be able to work with instances

Imports

# from pypi
from expects import (
    equal,
    expect,
    raise_error
)
import attr

Middle

Part 1: Parameters, methods and instances

First, let's define a class SomeClass.

class SomeClass:
    x = None

SomeClass has one parameter x without any value. You can think of parameters as the variables that every object assigned to a class will have. So, at this point, any object of class My_Class would have a variable x equal to None. To check this, I'll create two instances of that class and get the value of x for both of them.

instance_a= SomeClass()
instance_b= SomeClass()
print(f"Parameter x of instance_a: {instance_a.x}")
print(f"Parameter x of instance_b: {(instance_b.x)}")
Parameter x of instance_a: None
Parameter x of instance_b: None

For an existing instance you can assign new values for any of its parameters. In the next cell, assign a value of 5 to the parameter x of instance_a.

instance_a.x = 5
print(f"Parameter x of instance_a: {instance_a.x}")
Parameter x of instance_a: 5

The __init__ method

When you want to assign values to the parameters of your class when an instance is created, it is necessary to define a special method: `__init__`. The `__init__` method is called when you create an instance of a class. It can have multiple arguments to initialize the paramenters of your instance. In the next cell I will define `My_Class` with an `__init__` method that takes the instance (`self`) and an argument `y` as inputs.

@attr.s(auto_attribs=True)
class SomeClass: 
    x: int=None
instance_c = SomeClass(10)
print(f"{instance_c}")
SomeClass(x=10)

The __call__ method

Another important method is the __call__ method. It is performed whenever you call an initialized instance of a class. It can have multiple arguments and you can define it to do whatever you want like

  • Change a parameter,
  • Print a message,
  • Create new variables, etc.
@attr.s(auto_attribs=True)
class SomeClass:
    x: int

    def __call__(self, z: int):
        self.x += z
        print(self.x)
instance_d = SomeClass(5)

And now, see what happens when instance_d is called with argument 10.

instance_d(10)
15

Now, you are ready to complete the following cell so any instance from SomeClass:

  • Is initialized taking two arguments y and z and assigns them to x_1 and x_2, respectively. And,
  • When called, takes the values of the parameters x_1 and x_2, sums them, prints and returns the result.
@attr.s(auto_attribs=True)
class SomeClass: 
    x_1: int
    x_2: int

    def __call__(self) -> int:
        result = self.x_1 + self.x_2 
        print(f"Addition of {self.x_1} and {self.x_2} is {result}")
        return result

Run the next cell to check your implementation. If everything is correct, you shouldn't get any errors.

instance_e = SomeClass(x_1=10, x_2=15)

def test_class_definition():    
    expect(instance_e.x_1).to(equal(10))
    expect(instance_e.x_2).to(equal(15))
    expect(instance_e()).to(equal(25))
    return

test_class_definition()
Addition of 10 and 15 is 25

Custom methods

In addition to the __init__ and __call__ methods, your classes can have custom-built methods to do whatever you want when called. To define a custom method, you have to indicate its input arguments, the instructions that you want it to perform and the values to return (if any). In the next cell, My_Class is defined with my_method that multiplies the values of x_1 and x_2, sums that product with an input w, and returns the result.

@attr.s(auto_attribs=True)
class SomeClass:
    x_1: int
    x_2: int

    def __call__(self) -> int:
        return self.x_1 - 2 * self.x_2 

    def some_method(self, w: int) -> int:
        return self.x_1 * self.x_2 + w

Create an instance instance_f of My_Class with any integer values that you want for x_1 and x_2. For that instance, see the result of calling My_method, with an argument w equal to 16.

instance_f = SomeClass(1, 10)
print(f"Output of some_method: {instance_f.some_method(16)}")
Output of some_method: 26

As you can corroborate in the previous cell, to call a custom method m, with arguments args, for an instance i you must write i.m(args). With that in mind, methods can call others within a class. In the following cell, try to define new_method which calls my_method with v as input argument. Try to do this on your own in the cell given below.

@attr.s(auto_attribs=True)
class SomeClass: 
    x_1: int = None
    x_2: int = None

    def __call__(self) -> int:
        return self.x_1 - 2 * self.x_2 

    def some_method(self, w: int) -> int:
        return self.x_1 * self.x_2 + w

    def some_new_method(self, v: int) -> int:
        return self.some_method(v)
instance_g = SomeClass(1, 10)
print(f"Output of some_method: {instance_g.some_method(16)}")
print(f"Output of some_new_method: {instance_g.some_new_method(16)}")
Output of some_method: 26
Output of some_new_method: 26

Part 2: Subclasses and Inheritance

Trax uses classes and subclasses to define layers. The base class in Trax is layer, which means that every layer from a deep learning model is defined as a subclass of the layer class. In this part of the notebook, you are going to see how subclasses work. To define a subclass sub from class super, you have to write class sub(super): and define any method and parameter that you want for your subclass. In the next cell, I define sub_c as a subclass of My_Class with only one method (additional_method).

class SomeSub(SomeClass):
    def additional_method(self):
        print(self.x_1)
        return

Inheritance

When you define a subclass sub, every method and parameter is inherited from super class, including the __init__ and __call__ methods. This means that any instance from sub can use the methods defined in super. Run the following cell and see for yourself.

instance_sub_a = SomeSub(1, 10)
print(f"Parameter x_1 of instance_sub_a: {instance_sub_a.x_1}")
print(f"Parameter x_2 of instance_sub_a: {instance_sub_a.x_2}")
print(f"Output of some_method of instance_sub_a: {instance_sub_a.some_method(16)}")
Parameter x_1 of instance_sub_a: 1
Parameter x_2 of instance_sub_a: 10
Output of my_method of instance_sub_a: 26

As you can see, sub_c does not have an initialization method __init__, it is inherited from My_class. However, you can overwrite any method you want by defining it again in the subclass. For instance, in the next cell define a class sub_c with a redefined my_Method that multiplies x_1 and x_2 but does not add any additional argument.

@attr.s(auto_attribs=True)
class SomeSub(SomeClass):
    def some_method(self):
        return self.x_1 * self.x_2 

To check your implementation run the following cell.

test = SomeSub(3, 10)
actual = test.some_method()
expect(actual).to(equal(30))

print(f"Output of overridden my_method of test: {actual}")

def bad_call():
    test.some_method(16)

expect(bad_call).to(raise_error(TypeError))
Output of overridden my_method of test: 30

In the next cell, two instances are created, one of My_Class and another one of sub_c. The instances are initialized with equal x_1 and x_2 parameters.

y, z= 1, 10
instance_sub_a = SomeSub(y,z)
instance_a = SomeClass(y,z)
print(f"My_method for an instance of sub_c returns: {instance_sub_a.some_method()}")
print(f"My_method for an instance of My_Class returns: {instance_a.some_method(10)}")
My_method for an instance of sub_c returns: 10
My_method for an instance of My_Class returns: 20

As you can see, even though sub_c is a subclass from My_Class and both instances are initialized with the same values, My_method returns different results for each instance because you overwrote My_method for sub_c.

Introducing Trax

Background

This is going to be a first look at Trax a Deep Learning framework built by the Google Brain team.

Why Trax and not TensorFlow or PyTorch?

TensorFlow and PyTorch are both extensive frameworks that can do almost anything in deep learning. They offer a lot of flexibility, but that often means verbosity of syntax and extra time to code.

Trax is much more concise. It runs on a TensorFlow backend but allows you to train models with 1 line commands. Trax also runs end to end, allowing you to get data, model and train all with a single terse statement. This means you can focus on learning, instead of spending hours on the idiosyncrasies of a big framework's implementation.

Why not Keras then?

Keras is now part of Tensorflow itself from 2.0 onwards. Also, trax is good for implementing new state of the art algorithms like Transformers, Reformers, BERT because it is actively maintained by Google Brain Team for advanced deep learning tasks. It runs smoothly on CPUs,GPUs and TPUs as well with comparatively lesser modifications in code.

How to Code in Trax

Building models in Trax relies on 2 key concepts:- layers and combinators. Trax layers are simple objects that process data and perform computations. They can be chained together into composite layers using Trax combinators, allowing you to build layers and models of any complexity.

Trax, JAX, TensorFlow and Tensor2Tensor

You already know that Trax uses Tensorflow as a backend, but it also uses the JAX library to speed up computation too. You can view JAX as an enhanced and optimized version of numpy.

You import their version of numpy using import trax.fastmath.numpy. If you see this line, remember that when calling numpy you are really calling Trax’s version of numpy that is compatible with JAX.**

As a result of this, where you used to encounter the type numpy.ndarray now you will find the type jax.interpreters.xla.DeviceArray. The documentation for JAX is here and specifically they have a page with the numpy functions implemented so far.

Tensor2Tensor is another name you might have heard. It started as an end to end solution much like how Trax is designed, but it grew unwieldy and complicated. So you can view Trax as the new improved version that operates much faster and simpler.

Installing Trax

Note that there is another library called TraX which is something different.

We're going to use Trax version 1.3.1 here, so to install it with pip:

pip install trax==1.3.1

Note the == for the version, not =. This is a very big install so maybe take a break after you run it. You aren't going to get the full benefit of JAX if you don't have CUDA set up can use TPUs so make sure to set up CUDA if you're not using google colab. I also had to install cmake to get trax to install.

Imports

# pypi
import numpy

from trax import layers
from trax import shapes
from trax import fastmath
  • Layers are the basic building blocks for Trax
  • shapes are used for data handling
  • fastmath is the JAX version of numpy that can run on GPUs and TPUs

Middle

Layers

Layers are the core building blocks in Trax - they are the base classes. They take inputs, compute functions/custom calculations and return outputs.

Relu Layer

First we'll build a ReLU activation function as a layer. A layer like this is one of the simplest types. Notice there is no object initialization so it works just like a math function.

Note: Activation functions are also layers in Trax, which might look odd if you have been using other frameworks for a longer time.

relu = layers.Relu()

You can inspect the properties of a layer:

print("-- Properties --")
print("name :", relu.name)
print("expected inputs :", relu.n_in)
print("promised outputs :", relu.n_out, "\n")
-- Properties --
name : Relu
expected inputs : 1
promised outputs : 1 

We'll make an input the layer using numpy.

x = numpy.array([-2, -1, 0, 1, 2])
print("-- Inputs --")
print("x :", x, "\n")
-- Inputs --
x : [-2 -1  0  1  2] 

And see what it puts out.

y = relu(x)
print("-- Outputs --")
print("y :", y)
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
-- Outputs --
y : [0 0 0 1 2]

I don't know why but JAX doesn't thing I have a GPU, even though tensorflow does. This whole thing is a little messed up right now because the current release of tensorflow doesn't work on Ubuntu 20.10. I'm running it with the nightly build (2.5) but I have to install all the Trax dependencies one at a time or it will clobber the tensorflow installation with the older version (the one that doesn't work) so there's a lot of places for error.

Concatenate Layer

Now a layer that takes 2 inputs. Notice the change in the expected inputs property from 1 to 2.

First create a concatenate trax layer and check out its properties.

concatenate = layers.Concatenate()
print("-- Properties --")
print("name :", concatenate.name)
print("expected inputs :", concatenate.n_in)
print("promised outputs :", concatenate.n_out, "\n")
-- Properties --
name : Concatenate
expected inputs : 2
promised outputs : 1 

Now create the two inputs.

x1 = numpy.array([-10, -20, -30])
x2 = x1 / -10
print("-- Inputs --")
print("x1 :", x1)
print("x2 :", x2, "\n")
-- Inputs --
x1 : [-10 -20 -30]
x2 : [1. 2. 3.] 

And now feed the inputs through the concatenate layer.

y = concatenate([x1, x2])
print("-- Outputs --")
print("y :", y)
-- Outputs --
y : [-10. -20. -30.   1.   2.   3.]

Configuring Layers

You can change the default settings of layers. For example, you can change the expected inputs for a concatenate layer from 2 to 3 using the optional parameter n_items.

concatenate_three = layers.Concatenate(n_items=3)
print("-- Properties --")
print("name :", concatenate_three.name)
print("expected inputs :", concatenate_three.n_in)
print("promised outputs :", concatenate_three.n_out, "\n")
-- Properties --
name : Concatenate
expected inputs : 3
promised outputs : 1 

Create some inputs.

x1 = numpy.array([-10, -20, -30])
x2 = x1 / -10
x3 = x2 * 0.99
print("-- Inputs --")
print("x1 :", x1)
print("x2 :", x2)
print("x3 :", x3, "\n")
-- Inputs --
x1 : [-10 -20 -30]
x2 : [1. 2. 3.]
x3 : [0.99 1.98 2.97] 

And now do the concatenation.

y = concatenate_three([x1, x2, x3])
print("-- Outputs --")
print("y :", y)
-- Outputs --
y : [-10.   -20.   -30.     1.     2.     3.     0.99   1.98   2.97]

Layer Weights

Some layer types include mutable weights and biases that are used in computation and training. Layers of this type require initialization before use.

For example the LayerNorm layer calculates normalized data, that is also scaled by weights and biases. During initialization you pass the data shape and data type of the inputs, so the layer can initialize compatible arrays of weights and biases.

Initialize it.

norm = layers.LayerNorm()

Now some input data.

x = numpy.array([0, 1, 2, 3], dtype="float")

Use the input data signature to get the shape and type for the initializing weights and biases. We need to convert the input datatype from the usual ndarray to a trax ShapeDtype

norm.init(shapes.signature(x)) 
print("Normal shape:",x.shape, "Data Type:",type(x.shape))
print("Shapes Trax:",shapes.signature(x),"Data Type:",type(shapes.signature(x)))
Normal shape: (4,) Data Type: <class 'tuple'>
Shapes Trax: ShapeDtype{shape:(4,), dtype:float64} Data Type: <class 'trax.shapes.ShapeDtype'>

Here are its properties.

print("-- Properties --")
print("name :", norm.name)
print("expected inputs :", norm.n_in)
print("promised outputs :", norm.n_out)
-- Properties --
name : LayerNorm
expected inputs : 1
promised outputs : 1

And the weights and biases.

print("weights :", norm.weights[0])
print("biases :", norm.weights[1],)
weights : [1. 1. 1. 1.]
biases : [0. 0. 0. 0.]

We have our input array.

print("-- Inputs --")
print("x :", x)
-- Inputs --
x : [0. 1. 2. 3.]

So we can inspect what the layer did to it.

y = norm(x)
print("-- Outputs --")
print("y :", y)
-- Outputs --
y : [-1.3416404  -0.44721344  0.44721344  1.3416404 ]

If you look at it you can see that the positives cancel out the negatives, giving us a sum of 0. I don't know why that's the norm, but maybe it'll become obvious later.

Custom Layers

You can create your own custom layers too and define custom functions for computations by using layers.Fn. Let me show you how.

help(layers.Fn)
Help on function Fn in module trax.layers.base:

Fn(name, f, n_out=1)
    Returns a layer with no weights that applies the function `f`.
    
    `f` can take and return any number of arguments, and takes only positional
    arguments -- no default or keyword arguments. It often uses JAX-numpy (`jnp`).
    The following, for example, would create a layer that takes two inputs and
    returns two outputs -- element-wise sums and maxima:
    
        `Fn('SumAndMax', lambda x0, x1: (x0 + x1, jnp.maximum(x0, x1)), n_out=2)`
    
    The layer's number of inputs (`n_in`) is automatically set to number of
    positional arguments in `f`, but you must explicitly set the number of
    outputs (`n_out`) whenever it's not the default value 1.
    
    Args:
      name: Class-like name for the resulting layer; for use in debugging.
      f: Pure function from input tensors to output tensors, where each input
          tensor is a separate positional arg, e.g., `f(x0, x1) --> x0 + x1`.
          Output tensors must be packaged as specified in the `Layer` class
          docstring.
      n_out: Number of outputs promised by the layer; default value 1.
    
    Returns:
      Layer executing the function `f`.
  • Define a custom layer

    In this example we'll create a layer to calculate the input times 2.

    def double_it() -> layers.Fn:
        """A custom layer function that doubles any inputs
    
    
        Returns:
         a custom function that takes one numeric argument and doubles it
        """
        layer_name = "TimesTwo"
    
        # Custom function for the custom layer
        def func(x):
            return x * 2
    
        return layers.Fn(layer_name, func)
    
  • Test it
    double = double_it()
    
    print("-- Properties --")
    print("name :", double.name)
    print("expected inputs :", double.n_in)
    print("promised outputs :", double.n_out)
    
    -- Properties --
    name : TimesTwo
    expected inputs : 1
    promised outputs : 1
    
    x = numpy.array([1, 2, 3])
    print("-- Inputs --")
    print("x :", x, "\n")
    y = double(x)
    print("-- Outputs --")
    print("y :", y)
    
    -- Inputs --
    x : [1 2 3] 
    
    -- Outputs --
    y : [2 4 6]
    

Combinators

You can combine layers to build more complex layers. Trax provides a set of objects named combinator layers to make this happen. Combinators are themselves layers, so behavior commutes.

Serial Combinator

This is the most common and easiest to use. You could, for example, build a simple neural network by combining layers into a single layer using the Serial combinator. This new layer then acts just like a single layer, so you can inspect intputs, outputs and weights. Or even combine it into another layer! Combinators can then be used as trainable models. Try adding more layers.

Note:As you must have guessed, if there is serial combinator, there must be a parallel combinator as well. Do try to explore about combinators and other layers from the trax documentation and look at the repo to understand how these layers are written.

serial = layers.Serial(
    layers.LayerNorm(),
    layers.Relu(),
    double,
    layers.Dense(n_units=2),
    layers.Dense(n_units=1),
    layers.LogSoftmax() 
)
  • Initialization
    x = numpy.array([-2, -1, 0, 1, 2]) #input
    serial.init(shapes.signature(x))
    
    print("-- Serial Model --")
    print(serial,"\n")
    print("-- Properties --")
    print("name :", serial.name)
    print("sublayers :", serial.sublayers)
    print("expected inputs :", serial.n_in)
    print("promised outputs :", serial.n_out)
    print("weights & biases:", serial.weights, "\n")
    
    -- Serial Model --
    Serial[
      LayerNorm
      Relu
      TimesTwo
      Dense_2
      Dense_1
      LogSoftmax
    ] 
    
    -- Properties --
    name : Serial
    sublayers : [LayerNorm, Relu, TimesTwo, Dense_2, Dense_1, LogSoftmax]
    expected inputs : 1
    promised outputs : 1
    weights & biases: [(DeviceArray([1, 1, 1, 1, 1], dtype=int32), DeviceArray([0, 0, 0, 0, 0], dtype=int32)), (), (), (DeviceArray([[ 0.19178385,  0.1832077 ],
                 [-0.36949775, -0.03924937],
                 [ 0.43800744,  0.788491  ],
                 [ 0.43107533, -0.3623491 ],
                 [ 0.6186575 ,  0.04764405]], dtype=float32), DeviceArray([-3.0051979e-06,  1.4359505e-06], dtype=float32)), (DeviceArray([[-0.6747592],
                 [-0.8550365]], dtype=float32), DeviceArray([-8.9325863e-07], dtype=float32)), ()] 
    
    print("-- Inputs --")
    print("x :", x, "\n")
    
    y = serial(x)
    print("-- Outputs --")
    print("y :", y)
    
    -- Inputs --
    x : [-2 -1  0  1  2] 
    
    -- Outputs --
    y : [0.]
    

JAX

Just remember to lookout for which numpy you are using, the regular numpy or Trax's JAX compatible numpy. Watch those import blocks. Numpy and fastmath.numpy have different data types.

Regular numpy.

x_numpy = numpy.array([1, 2, 3])
print("good old numpy : ", type(x_numpy), "\n")
good old numpy :  <class 'numpy.ndarray'> 

Fastmath and jax numpy.

x_jax = fastmath.numpy.array([1, 2, 3])
print("jax trax numpy : ", type(x_jax))
jax trax numpy :  <class 'jax.interpreters.xla._DeviceArray'>

End

  • Trax is a concise framework, built on TensorFlow, for end to end machine learning. The key building blocks are layers and combinators.
  • This was a lab that was part of coursera's Natural Language Processing with Sequence Models course put up by DeepLearning.AI.

Word Embeddings: Visualizing the Embeddings

Extracting and Visualizing the Embeddings

In the previous post we built a Continuous Bag of Words model to predict a word based on the fraction of words each word surrounding it made up within a window (e.g. the fraction of the four words surrounding the word that each word made up). Now we're going to use the weights of the model as word embeddings and see if we can visualize them.

Imports

# python
from argparse import Namespace
from functools import partial

# pypi
from sklearn.decomposition import PCA

import holoviews
import hvplot.pandas
import pandas

# this project
from neurotic.nlp.word_embeddings import (
    Batches,
    CBOW,
    DataCleaner,
    MetaData,
    TheTrainer,
    )
# my other stuff
from graeae import EmbedHoloviews, Timer

Set Up

cleaner = DataCleaner()
meta = MetaData(cleaner.processed)
TIMER = Timer(speak=False)
SLUG = "word-embeddings-visualizing-the-embeddings"
Embed = partial(EmbedHoloviews, folder_path=f"files/posts/nlp/{SLUG}")
Plot = Namespace(
    width=990,
    height=780,
    fontscale=2,
    tan="#ddb377",
    blue="#4687b7",
    red="#ce7b6d",
 )
hidden_layer = 50
half_window = 2
batch_size = 128
repetitions = 250
vocabulary_size = len(meta.vocabulary)

model = CBOW(hidden=hidden_layer, vocabulary_size=vocabulary_size)
batches = Batches(data=cleaner.processed, word_to_index=meta.word_to_index,
                  half_window=half_window, batch_size=batch_size, batches=repetitions)

trainer = TheTrainer(model, batches, emit_point=50, verbose=True)
with TIMER:
    trainer()
2020-12-16 16:32:17,189 graeae.timers.timer start: Started: 2020-12-16 16:32:17.189213
50: loss=9.88889093658385
new learning rate: 0.0198
100: loss=9.138356897918037
150: loss=9.149555378031549
new learning rate: 0.013068000000000001
200: loss=9.077599951734605
2020-12-16 16:32:37,403 graeae.timers.timer end: Ended: 2020-12-16 16:32:37.403860
2020-12-16 16:32:37,405 graeae.timers.timer end: Elapsed: 0:00:20.214647
250: loss=8.607763835003631
print(trainer.best_loss)
8.186490214727549

Middle

Set It Up

We're going to use the method of averaging the weights of the two layers to form the embeddings.

embeddings = (trainer.best_weights.input_weights.T
              + trainer.best_weights.hidden_weights)/2

And now our words.

words = ["king", "queen","lord","man", "woman","dog","wolf",
         "rich","happy","sad"]

Now we need to translate the words into their indices so we can grab the rows in the mebedding that match.

indices = [meta.word_to_index[word] for word in words]
X = embeddings[indices, :]
print(X.shape, indices) 
(10, 50) [2745, 3951, 2961, 3023, 5675, 1452, 5674, 4191, 2316, 4278]

There are 10 rows to match our ten words and 50 columns to match the number chosen for the hidden layer.

Visualizing

We're going to use sklearn's PCA for Principal Component Analysis. The n_components argument is the number of components it will keep - we'll keep 2.

pca = PCA(n_components=2)
reduced = pca.fit(X).transform(X)
pca_data = pandas.DataFrame(
    reduced,
    columns=["X", "Y"])

pca_data["Word"] = words
points = pca_data.hvplot.scatter(x="X",
                                 y="Y", color=Plot.red)
labels = pca_data.hvplot.labels(x="X", y="Y", text="Word", text_baseline="top")
plot = (points * labels).opts(
    title="PCA Embeddings",
    height=Plot.height,
    width=Plot.width,
    fontscale=Plot.fontscale,
)
outcome = Embed(plot=plot, file_name="embeddings_pca")()
print(outcome)

Figure Missing

Well, that's pretty horrible. Might need work.

End

This is the final post in the series looking at using a Continuous Bag of Words model to create word embeddings. Here are the other posts.