Sentiment Analysis: Defining the Model

Beginning

This continues a series on sentiment analysis with deep learning. In the previous post we loaded and processed our data set. In this post we'll see about actually defining the Neural Network.

In this part we will write your own library of layers. It will be very similar to the one used in Trax and also in Keras and PyTorch. The intention is that in writing our own small framework will help us understand how they all work and use them more effectively in the future.

Imports

# from pypi
from expects import be_true, expect
from trax import fastmath

import attr
import numpy
import trax
import trax.layers as trax_layers

# this project
from neurotic.nlp.twitter.tensor_generator import TensorBuilder

Set Up

Some aliases to get closer to what the notebook has.

numpy_fastmath = fastmath.numpy
random = fastmath.random

Middle

The Base Layer Class

This will be the base class that the others will inherit from.

@attr.s(auto_attribs=True)
class Layer:
    """Base class for layers
    """
    def forward(self, x: numpy.ndarray):
        """The forward propagation method

       Raises:
        NotImplementedError - method is called but child hasn't implemented it
       """
        raise NotImplementedError

    def init_weights_and_state(self, input_signature, random_key):
        """method to initialize the weights
       based on the input signature and random key,
       be implemented by subclasses of this Layer class
       """
        raise NotImplementedError

    def init(self, input_signature, random_key) -> numpy.ndarray:
        """initializes and returns the weights

       Note:
        This is just an alias for the ``init_weights_and_state``
       method for some reason

       Args: 
        input_signature: who knows?
        random_key: once again, who knows?

       Returns:
        the weights
       """
        self.init_weights_and_state(input_signature, random_key)
        return self.weights

    def __call__(self, x) -> numpy.ndarray:
        """This is an alias for the ``forward`` method

       Args:
        x: input array

       Returns:
        whatever the ``forward`` method does
       """
        return self.forward(x)

The ReLU class

Here's the ReLU function:

\[ \mathrm{ReLU}(x) = \mathrm{max}(0,x) \]

We'll implement the ReLU activation function below. The function will take in a matrix or vector and it transform all the negative numbers into 0 while keeping all the positive numbers intact.

Please use numpy.maximum(A,k) to find the maximum between each element in A and a scalar k.

class Relu(Layer):
    """Relu activation function implementation"""
    def forward(self, x: numpy.ndarray) -> numpy.ndarray:
        """"Performs the activation

       Args: 
           - x: the input

       Returns:
           - activation: all positive or 0 version of x
       """
        return numpy.maximum(x, 0)

Test It

x = numpy.array([[-2.0, -1.0, 0.0], [0.0, 1.0, 2.0]], dtype=float)
relu_layer = Relu()
print("Test data is:")
print(x)
print("\nOutput of Relu is:")
actual = relu_layer(x)

print(actual)

expected = numpy.array([[0., 0., 0.],
                        [0., 1., 2.]])

expect(numpy.allclose(actual, expected)).to(be_true)
Test data is:
[[-2. -1.  0.]
 [ 0.  1.  2.]]

Output of Relu is:
[[0. 0. 0.]
 [0. 1. 2.]]

The Dense class

Implement the forward function of the Dense class.

  • The forward function multiplies the input to the layer (x) by the weight matrix (W).

\[ \mathrm{forward}(\mathbf{x},\mathbf{W}) = \mathbf{xW} \]

  • You can use numpy.dot to perform the matrix multiplication.

Note that for more efficient code execution, you will use the trax version of math, which includes a trax version of numpy and also random.

Implement the weight initializer new_weights function

  • Weights are initialized with a random key.
  • The second parameter is a tuple for the desired shape of the weights (num_rows, num_cols)
  • The num of rows for weights should equal the number of columns in x, because for forward propagation, you will multiply x times weights.

Please use trax.fastmath.random.normal(key, shape, dtype=tf.float32) to generate random values for the weight matrix. The key difference between this function and the standard numpy randomness is the explicit use of random keys, which need to be passed in. While it can look tedious at the first sight to pass the random key everywhere, you will learn in Course 4 why this is very helpful when implementing some advanced models.

  • key can be generated by calling random.get_prng(seed) and passing in a number for the seed.
  • shape is a tuple with the desired shape of the weight matrix.
    • The number of rows in the weight matrix should equal the number of columns in the variable x. Since x may have 2 dimensions if it represents a single training example (row, col), or three dimensions (batch_size, row, col), get the last dimension from the tuple that holds the dimensions of x.
    • The number of columns in the weight matrix is the number of units chosen for that dense layer. Look at the __init__ function to see which variable stores the number of units.
  • dtype is the data type of the values in the generated matrix; keep the default of tf.float32. In this case, don't explicitly set the dtype (just let it use the default value).

Set the standard deviation of the random values to 0.1

  • The values generated have a mean of 0 and standard deviation of 1.
  • Set the default standard deviation stdev to be 0.1 by multiplying the standard deviation to each of the values in the weight matrix.

See how the fastmath.trax.random.normal function works.

tmp_key = random.get_prng(seed=1)
print("The random seed generated by random.get_prng")
display(tmp_key)
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
The random seed generated by random.get_prng
DeviceArray([0, 1], dtype=uint32)

For some reason tensorflow can't find the GPU. Setting the log level to 0 like the message suggests shows that it gives up after trying to find a TPU, there's no indication that it's looking for the GPU.

import tensorflow
print(tensorflow.test.gpu_device_name())

Hmmm. I'll have to troubleshoot that.

print("choose a matrix with 2 rows and 3 columns")
tmp_shape=(2,3)
print(tmp_shape)
choose a matrix with 2 rows and 3 columns
(2, 3)

Generate a weight matrix Note that you'll get an error if you try to set dtype to tf.float32, where tf is tensorflow Just avoid setting the dtype and allow it to use the default data type

tmp_weight = random.normal(key=tmp_key, shape=tmp_shape)

print("Weight matrix generated with a normal distribution with mean 0 and stdev of 1")
display(tmp_weight)
Weight matrix generated with a normal distribution with mean 0 and stdev of 1
DeviceArray([[ 0.957307  , -0.9699291 ,  1.0070664 ],
             [ 0.36619022,  0.17294823,  0.29092228]], dtype=float32)
@attr.s(auto_attribs=True)
class Dense(Layer):
    """
    A dense (fully-connected) layer.

    Args:
     - n_units: the number of columns for our weight matrix
     - init_stdev: standard deviation for our initial weights
    """
    n_units: int
    init_stdev: float=0.1

    def forward(self, x: numpy.ndarray) -> numpy.ndarray:
        """The dot product of the input and the weights

       Args:
        x: input to multipyl

       Returns:
        product of x and weights
       """
        return numpy.dot(x, self.weights)

    def init_weights_and_state(self, input_signature: tuple,
                               random_key: int) -> numpy.ndarray:
        """initializes the weights

       Args:
        input_signature: tuple whose final dimension will be the number of rows
        random_ke: something to start the random normal generator with
       """
        input_shape = input_signature.shape

        # to allow for more than two-dimensional matrices,
        # we use the last column of the input shape, rather than assuming it's
        # column 1
        self.weights = (random.normal(key=random_key,
                                      shape=(input_shape[-1], self.n_units))
             * self.init_stdev)
        return self.weights
dense_layer = Dense(n_units=10)  #sets  number of units in dense layer
random_key = random.get_prng(seed=0)  # sets random seed
z = numpy.array([[2.0, 7.0, 25.0]]) # input array 

dense_layer.init(z, random_key)
print("Weights are\n ",dense_layer.weights) #Returns randomly generated weights
output = dense_layer(z)
print("Foward function output is ", output) # Returns multiplied values of units and weights

expected_weights = numpy.array([
    [-0.02837108,  0.09368162, -0.10050076,  0.14165013,  0.10543301,  0.09108126,
     -0.04265672,  0.0986188,  -0.05575325,  0.00153249],
    [-0.20785688,  0.0554837,   0.09142365,  0.05744595,  0.07227863,  0.01210617,
     -0.03237354,  0.16234995,  0.02450038, -0.13809784],
    [-0.06111237,  0.01403724,  0.08410042, -0.1094358,  -0.10775021, -0.11396459,
     -0.05933381, -0.01557652, -0.03832145, -0.11144515]])

expected_output = numpy.array(
    [[-3.0395496,   0.9266802,   2.5414743,  -2.050473,   -1.9769388,  -2.582209,
      -1.7952735,   0.94427425, -0.8980402,  -3.7497487]])

expect(numpy.allclose(dense_layer.weights, expected_weights)).to(be_true)
expect(numpy.allclose(output, expected_output)).to(be_true)
Weights are
  [[-0.02837108  0.09368162 -0.10050076  0.14165013  0.10543301  0.09108126
  -0.04265672  0.0986188  -0.05575325  0.00153249]
 [-0.20785688  0.0554837   0.09142365  0.05744595  0.07227863  0.01210617
  -0.03237354  0.16234995  0.02450038 -0.13809784]
 [-0.06111237  0.01403724  0.08410042 -0.1094358  -0.10775021 -0.11396459
  -0.05933381 -0.01557652 -0.03832145 -0.11144515]]
Foward function output is  [[-3.03954965  0.92668021  2.54147445 -2.05047299 -1.97693891 -2.58220917
  -1.79527355  0.94427423 -0.89804017 -3.74974866]]

The Layers for the Trax-Based Model

For the model implementation we will use the Trax layers library. Trax layers are very similar to the ones we implemented above, but in addition to trainable weights they also have a non-trainable state. This state is used in layers like batch normalization and for inference - we will learn more about it later on.

Dense

First, look at the code of the Trax Dense layer and compare to the implementation above.

Another other important layer that we will use a lot is the Serial layer which allows us to execute one layer after another in sequence.

  • You can pass in the layers as arguments to Serial, separated by commas.
  • For example: tl.Serial(tl.Embeddings(...), tl.Mean(...), tl.Dense(...), tl.LogSoftmax(...))

The layer classes have pretty good docstrings, unlike the fastmath stuff, so it might be useful to look at it - but it's too long to include here.

We're also going to use an Embedding

  • tl.Embedding(vocab_size, d_feature).
  • vocab_size is the number of unique words in the given vocabulary.
  • d_feature is the number of elements in the word embedding (some choices for a word embedding size range from 150 to 300, for example).
tmp_embed = trax_layers.Embedding(vocab_size=3, d_feature=2)
display(tmp_embed)
Embedding_3_2

Another useful layer is the Mean which calculates means across an axis. In this case, use axis = 1 (across rows) to get an average embedding vector (an embedding vector that is an average of all words in the vocabulary).

  • For example, if the embedding matrix is 300 elements and vocab size is 10,000 words, taking the mean of the embedding matrix along axis=1 will yield a vector of 300 elements.

Pretend the embedding matrix uses 2 elements for embedding the meaning of a word and has a vocabulary size of 3, so it has shape (2,3).

tmp_embed = numpy.array([[1,2,3,],
                         [4,5,6]
                         ])

First take the mean along axis 0, which creates a vector whose length equals the vocabulary size (the number of columns).

display(numpy.mean(tmp_embed,axis=0))
array([2.5, 3.5, 4.5])

If you take the mean along axis 1 it creates a vector whose length equals the number of elements in a word embedding (the rows).

display(numpy.mean(tmp_embed,axis=1))
array([2., 5.])

Finally, a LogSoftmax layer gives you a log-softmax output.

Online Documentation

For completeness, here's some links to the Read the Docs documentation for these layers.

The Classifier Function

builder = TensorBuilder()
size_of_vocabulary = len(builder.vocabulary)
def classifier(vocab_size: int=size_of_vocabulary,
               embedding_dim: int=256,
               output_dim: int=2) -> trax_layers.Serial:
    """Creates the classifier model

    Args:
     vocab_size: number of tokens in the training vocabulary
     embedding_dim: output dimension for the Embedding layer
     output_dim: dimension for the Dense layer

    Returns:
     the composed layer-model
    """
    embed_layer = trax_layers.Embedding(
        vocab_size=vocab_size, # Size of the vocabulary
        d_feature=embedding_dim)  # Embedding dimension

    mean_layer = trax_layers.Mean(axis=1)

    dense_output_layer = trax_layers.Dense(n_units = output_dim)

    log_softmax_layer = trax_layers.LogSoftmax()

    model = trax_layers.Serial(
      embed_layer,
      mean_layer,
      dense_output_layer,
      log_softmax_layer
    )
    return model
tmp_model = classifier()
print(type(tmp_model))
display(tmp_model)
<class 'trax.layers.combinators.Serial'>
Serial[
  Embedding_9164_256
  Mean
  Dense_2
  LogSoftmax
]

Ending

Now that we have our Deep Learning model, we'll move on to training it.