Sentiment Analysis: Testing the Model

Cloistered Monkey

2020-12-23 15:52

Beginning

Having trained our Deep Learning model for Sentiment Analysis previously we're now going to test how well it did.

Imports

# python
from argparse import Namespace
from functools import partial
from pathlib import Path

# pypi
import nltk
import trax.fastmath.numpy as numpy
import trax.layers as trax_layers

# this project
from neurotic.nlp.twitter.sentiment_network import SentimentNetwork
from neurotic.nlp.twitter.tensor_generator import TensorBuilder, TensorGenerator

Set Up

Download

This is because of all the trouble getting trax and tensorflow working with CUDA means I have to keep re-building the Docker container I'm using.

data_path = Path("~/data/datasets/nltk_data/").expanduser()
nltk.download("twitter_samples", download_dir=str(data_path))

The Data Generators

BATCH_SIZE = 16
converter = TensorBuilder()
train_generator = partial(TensorGenerator, converter,
                                     positive_data=converter.positive_training,
                                     negative_data=converter.negative_training,
                                     batch_size=BATCH_SIZE)
valid_generator=partial(TensorGenerator,
                          converter,
                          positive_data=converter.positive_validation,
                          negative_data=converter.negative_validation,
                          batch_size=BATCH_SIZE)

TRAINING_GENERATOR=train_generator()
VALIDATION_GENERATOR = valid_generator()
SIZE_OF_VOCABULARY = len(converter.vocabulary)
TRAINING_LOOPS = 100

OUTPUT_PATH = Path("~/models").expanduser()
if not OUTPUT_PATH.is_dir():
    OUTPUT_PATH.mkdir()

The Model Builder

trainer = SentimentNetwork(
    training_generator=TRAINING_GENERATOR,
    validation_generator=VALIDATION_GENERATOR,
    vocabulary_size=SIZE_OF_VOCABULARY,
    training_loops=TRAINING_LOOPS,
    output_path=OUTPUT_PATH)

trainer.fit()

WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

Step    110: Ran 10 train steps in 4.89 secs
Step    110: train CrossEntropyLoss |  0.00662578
Step    110: eval  CrossEntropyLoss |  0.00139236
Step    110: eval          Accuracy |  1.00000000

Step    120: Ran 10 train steps in 2.61 secs
Step    120: train CrossEntropyLoss |  0.03323080
Step    120: eval  CrossEntropyLoss |  0.00684100
Step    120: eval          Accuracy |  1.00000000

Step    130: Ran 10 train steps in 1.27 secs
Step    130: train CrossEntropyLoss |  0.11124543
Step    130: eval  CrossEntropyLoss |  0.00011413
Step    130: eval          Accuracy |  1.00000000

Step    140: Ran 10 train steps in 0.71 secs
Step    140: train CrossEntropyLoss |  0.03609489
Step    140: eval  CrossEntropyLoss |  0.00000590
Step    140: eval          Accuracy |  1.00000000

Step    150: Ran 10 train steps in 1.92 secs
Step    150: train CrossEntropyLoss |  0.08605278
Step    150: eval  CrossEntropyLoss |  0.00003427
Step    150: eval          Accuracy |  1.00000000

Step    160: Ran 10 train steps in 1.31 secs
Step    160: train CrossEntropyLoss |  0.04926774
Step    160: eval  CrossEntropyLoss |  0.00003597
Step    160: eval          Accuracy |  1.00000000

Step    170: Ran 10 train steps in 1.30 secs
Step    170: train CrossEntropyLoss |  0.00986138
Step    170: eval  CrossEntropyLoss |  0.00026259
Step    170: eval          Accuracy |  1.00000000

Step    180: Ran 10 train steps in 0.76 secs
Step    180: train CrossEntropyLoss |  0.00773767
Step    180: eval  CrossEntropyLoss |  0.00038017
Step    180: eval          Accuracy |  1.00000000

Step    190: Ran 10 train steps in 1.35 secs
Step    190: train CrossEntropyLoss |  0.00555876
Step    190: eval  CrossEntropyLoss |  0.00000706
Step    190: eval          Accuracy |  1.00000000

Step    200: Ran 10 train steps in 0.76 secs
Step    200: train CrossEntropyLoss |  0.00381955
Step    200: eval  CrossEntropyLoss |  0.00000122
Step    200: eval          Accuracy |  1.00000000

The Accuracy

This is from the last post. I havent' figured out how to arrange all the code yet.

def compute_accuracy(preds: numpy.ndarray,
                     y: numpy.ndarray,
                     y_weights: numpy.ndarray) -> tuple:
    """Compute a batch accuracy

    Args: 
       preds: a tensor of shape (dim_batch, output_dim) 
       y: a tensor of shape (dim_batch,) with the true labels
       y_weights: a n.ndarray with the a weight for each example

    Returns: 
       accuracy: a float between 0-1 
       weighted_num_correct (np.float32): Sum of the weighted correct predictions
       sum_weights (np.float32): Sum of the weights
    """
    # Create an array of booleans, 
    # True if the probability of positive sentiment is greater than
    # the probability of negative sentiment
    # else False
    is_pos =  preds[:, 1] > preds[:, 0]

    # convert the array of booleans into an array of np.int32
    is_pos_int = is_pos.astype(numpy.int32)

    # compare the array of predictions (as int32) with the target (labels) of type int32
    correct = is_pos_int == y

    # Count the sum of the weights.
    sum_weights = y_weights.sum()

    # convert the array of correct predictions (boolean) into an arrayof np.float32
    correct_float = correct.astype(numpy.float32)

    # Multiply each prediction with its corresponding weight.
    weighted_correct_float = correct_float.dot(y_weights)

    # Sum up the weighted correct predictions (of type np.float32), to go in the
    # denominator.
    weighted_num_correct = weighted_correct_float.sum()

    # Divide the number of weighted correct predictions by the sum of the
    # weights.
    accuracy = weighted_num_correct/sum_weights

    return accuracy, weighted_num_correct, sum_weights

Middle

Testing the model on Validation Data

Now we'll test our model's prediction accuracy on validation data.

This program will take in a data generator and the model.

The generator allows us to get batches of data. You can use it with a for loop:

for batch in iterator: 
   # do something with that batch

batch has dimensions (X, Y, weights).

Column 0 corresponds to the tweet as a tensor (input).
Column 1 corresponds to its target (actual label, positive or negative sentiment).
Column 2 corresponds to the weights associated (example weights)
You can feed the tweet into model and it will return the predictions for the batch.

# UNQ_C8 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: test_model
def test_model(generator: TensorGenerator, model: trax_layers.Serial) -> float:
    """Calculate the accuracy of the model

    Args: 
       generator: an iterator instance that provides batches of inputs and targets
       model: a model instance 
    Returns: 
       accuracy: float corresponding to the accuracy
    """

    accuracy = 0.
    total_num_correct = 0
    total_num_pred = 0

    ### START CODE HERE (Replace instances of 'None' with your code) ###
    for batch in generator: 

        # Retrieve the inputs from the batch
        inputs = batch[0]

        # Retrieve the targets (actual labels) from the batch
        targets = batch[1]

        # Retrieve the example weight.
        example_weight = batch[2]

        # Make predictions using the inputs
        pred = model(inputs)

        # Calculate accuracy for the batch by comparing its predictions and targets
        batch_accuracy, batch_num_correct, batch_num_pred = compute_accuracy(
            pred, targets, example_weight)

        # Update the total number of correct predictions
        # by adding the number of correct predictions from this batch
        total_num_correct += batch_num_correct

        # Update the total number of predictions 
        # by adding the number of predictions made for the batch
        total_num_pred += batch_num_pred

    # Calculate accuracy over all examples
    accuracy = total_num_correct/total_num_pred

    ### END CODE HERE ###
    return accuracy

# DO NOT EDIT THIS CELL
# testing the accuracy of your model: this takes around 20 seconds
model = trainer.training_loop.eval_model

# we used all the data for the training and validation (oops)
# so we don't have any test data. Fix that later
#accuracy = test_model(VALIDATION_GENERATOR, model)
generator = valid_generator(infinite=False)
accuracy = test_model(generator, model)
print(f'The accuracy of your model on the validation set is {accuracy:.4f}', )

The accuracy of your model on the validation set is 0.9995

Testing Some Custom Input

Finally, let's test some custom input. You will see that deepnets are more powerful than the older methods we have used before. Although we got close to 100% accuracy using Naive Bayes and Logistic Regression, that was because the task was way easier.

This is used to predict on a new sentence.

def predict(sentence: str) -> tuple:
    """Predicts the sentiment of the sentence

    Args:
     sentence to get the sentiment for

    Returns:
     predictions, sentiment
    """
    inputs = numpy.array(converter.to_tensor(sentence))

    # Batch size 1, add dimension for batch, to work with the model
    inputs = inputs.reshape(1, len(inputs))

    # predict with the model
    probabilities = model(inputs)

    # Turn probabilities into categories
    prediction = int(probabilities[0, 1] > probabilities[0, 0])

    sentiment = "positive" if prediction == 1 else "negative"

    return prediction, sentiment

sentence = "It's such a nice day, think i'll be taking Sid to Ramsgate fish and chips for lunch at Peter's fish factory and then the beach maybe"
inputs = numpy.array(converter.to_tensor(sentence))

A Positive Sentence

sentence = "It's such a nice day, think i'll be taking Sid to Ramsgate fish and chips for lunch at Peter's fish factory and then the beach maybe"
tmp_pred, tmp_sentiment = predict(sentence)
print(f"The sentiment of the sentence \n***\n\"{sentence}\"\n***\nis {tmp_sentiment}.")

The sentiment of the sentence 
***
"It's such a nice day, think i'll be taking Sid to Ramsgate fish and chips for lunch at Peter's fish factory and then the beach maybe"
***
is positive.

A Negative Sentence

sentence = "I hated my day, it was the worst, I'm so sad."
tmp_pred, tmp_sentiment = predict(sentence)
print(f"The sentiment of the sentence \n***\n\"{sentence}\"\n***\nis {tmp_sentiment}.")

The sentiment of the sentence 
***
"I hated my day, it was the worst, I'm so sad."
***
is negative.

Notice that the model works well even for complex sentences.

On Pooh

s = "Oh, bother!"
print(f"{s}: {predict(s)}")

Oh, bother!: (0, 'negative')

On Deep Nets

Deep nets allow you to understand and capture dependencies that you would have not been able to capture with a simple linear regression, or logistic regression.

It also allows you to better use pre-trained embeddings for classification and tends to generalize better.

End

So, there you have it, a Deep Learning Model for Sentiment Analysis built using Trax. Here are the prior posts in this series.

Table of Contents