# Tensorflow Docker Setup

## Beginning

I recently re-started using tensorflow and the python interpreter kept crashing. It appears that they compiled the latest version to require AVX2 and the server I was using has AVX but not AVX2. I couldn't find any documentation about this requirement, but running the code on a different machine that has both AVX and AVX2 got rid of the problem. This might be a transient problem, as the nightly build doesn't crash on either machine, but trying to run the nightly build with other code is a nightmare as it seems that every framework related to tensorflow tries to revert the version back to the broken one, so I gave up and changed machines. The process of setting up cuda and tensorflow over and over again proved difficult, as there's different ways to do it (through apt, using nvidia installers, building from source) and each presents a different problem. The version apt installs, for instance puts the folders in places the tensorflow configure.py file can't figure out (if you build tensorflow from source) and using the nvidia debian package for cudnn left my packages in a broken state, as it was trying to install something that then broke another packages requirements… Anyway, I'm going to try and avoid building tensorflow from source and run everything from docker containers.

## Setting Up

I don't know for sure that this is necessary, but I followed nvidia's docker installation instructions. If nothing else you can use it to check that the setup works. After that I setup tensorflow's container with a dockerfile:

FROM tensorflow/tensorflow:latest-gpu-py3-jupyter
RUN apt-get update && \
apt-get install openssh-server --yes && \
echo "Adding neurotic user" && \
COPY authorized_keys /home/neurotic/.ssh/
ENTRYPOINT service ssh restart && bash


The latest tensorflow container comes with python 2.7 as the default for some reason, and all the dependencies are installed with it in mind so to get python 3 (3.6 as of now) you need to specify the py3 tag like I did in the from line. Additionally I use ssh-forwarding for jupyter kernels so I can work in emacs with them so I installed the ssh-server and also created a non-root user to run jupyter. The last line ENTRYPOINT service ssh restart && bash makes sure the ssh-server is running and opens up a bash shell. To build the container I used this command:

docker build -t neurotic-tensorflow .


This creates an image named neurotic-tensorflow. To run it I use this command:

docker run --gpus all -p 2222:22 --name data-neurotic \
--mount type=bind,source=$HOME/projects/neurotic-networks,target=/home/neurotic/neurotic-networks \ --mount type=bind,source=/media/data,target=/home/neurotic/data \ -it neurotic-tensorflow bash  The --gpus all makes the GPUs available. The -p 2222:22 flag maps the ssh-server in the container to port 2222 on the host. This allows you to ssh into the container using ssh neurotic@localhost -p 2222 without knowing the IP address of the container. You can also grab the IP address and then ssh into it like it's another machine on the network: docker inspect --format "{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}" data-neurotic  Where data-neurotic is the name given to the container in the docker run command, but the advantage of the port mapping is that: • You don't need to know the address of the container if you are on the host machine. • You can ssh into the container from another machine by substituting the host's IP address for localhost in the ssh command The mount options mount some folders into the container so we can share files. Once you've run it you can restart it at any time using: docker start data-neurotic  And if you need to run something as root you can attach the running container. docker attach data-neurotic  NOTE: The python 3 container has cuda 10.1 installed but the latest version of tensorflow expects 11.0 - and tensorflow seems to use hard-coded names. So to make it work you either have to upgrade cuda or symlink the file and rename it to look like the newer version. ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudart.so.10.1 /usr/lib/x86_64-linux-gnu/libcudart.so.11.0  Tensorflow dependencies are incredibly convoluted and broken all over the place. # Sentiment Analysis: Testing the Model ## Table of Contents ## Beginning Having trained our Deep Learning model for Sentiment Analysis previously we're now going to test how well it did. ### Imports # python from argparse import Namespace from functools import partial from pathlib import Path # pypi import nltk import trax.fastmath.numpy as numpy import trax.layers as trax_layers # this project from neurotic.nlp.twitter.sentiment_network import SentimentNetwork from neurotic.nlp.twitter.tensor_generator import TensorBuilder, TensorGenerator  ### Set Up #### Download This is because of all the trouble getting trax and tensorflow working with CUDA means I have to keep re-building the Docker container I'm using. data_path = Path("~/data/datasets/nltk_data/").expanduser() nltk.download("twitter_samples", download_dir=str(data_path))  #### The Data Generators BATCH_SIZE = 16 converter = TensorBuilder() train_generator = partial(TensorGenerator, converter, positive_data=converter.positive_training, negative_data=converter.negative_training, batch_size=BATCH_SIZE) valid_generator=partial(TensorGenerator, converter, positive_data=converter.positive_validation, negative_data=converter.negative_validation, batch_size=BATCH_SIZE) TRAINING_GENERATOR=train_generator() VALIDATION_GENERATOR = valid_generator() SIZE_OF_VOCABULARY = len(converter.vocabulary) TRAINING_LOOPS = 100 OUTPUT_PATH = Path("~/models").expanduser() if not OUTPUT_PATH.is_dir(): OUTPUT_PATH.mkdir()  #### The Model Builder trainer = SentimentNetwork( training_generator=TRAINING_GENERATOR, validation_generator=VALIDATION_GENERATOR, vocabulary_size=SIZE_OF_VOCABULARY, training_loops=TRAINING_LOOPS, output_path=OUTPUT_PATH)  trainer.fit()  WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) Step 110: Ran 10 train steps in 4.89 secs Step 110: train CrossEntropyLoss | 0.00662578 Step 110: eval CrossEntropyLoss | 0.00139236 Step 110: eval Accuracy | 1.00000000 Step 120: Ran 10 train steps in 2.61 secs Step 120: train CrossEntropyLoss | 0.03323080 Step 120: eval CrossEntropyLoss | 0.00684100 Step 120: eval Accuracy | 1.00000000 Step 130: Ran 10 train steps in 1.27 secs Step 130: train CrossEntropyLoss | 0.11124543 Step 130: eval CrossEntropyLoss | 0.00011413 Step 130: eval Accuracy | 1.00000000 Step 140: Ran 10 train steps in 0.71 secs Step 140: train CrossEntropyLoss | 0.03609489 Step 140: eval CrossEntropyLoss | 0.00000590 Step 140: eval Accuracy | 1.00000000 Step 150: Ran 10 train steps in 1.92 secs Step 150: train CrossEntropyLoss | 0.08605278 Step 150: eval CrossEntropyLoss | 0.00003427 Step 150: eval Accuracy | 1.00000000 Step 160: Ran 10 train steps in 1.31 secs Step 160: train CrossEntropyLoss | 0.04926774 Step 160: eval CrossEntropyLoss | 0.00003597 Step 160: eval Accuracy | 1.00000000 Step 170: Ran 10 train steps in 1.30 secs Step 170: train CrossEntropyLoss | 0.00986138 Step 170: eval CrossEntropyLoss | 0.00026259 Step 170: eval Accuracy | 1.00000000 Step 180: Ran 10 train steps in 0.76 secs Step 180: train CrossEntropyLoss | 0.00773767 Step 180: eval CrossEntropyLoss | 0.00038017 Step 180: eval Accuracy | 1.00000000 Step 190: Ran 10 train steps in 1.35 secs Step 190: train CrossEntropyLoss | 0.00555876 Step 190: eval CrossEntropyLoss | 0.00000706 Step 190: eval Accuracy | 1.00000000 Step 200: Ran 10 train steps in 0.76 secs Step 200: train CrossEntropyLoss | 0.00381955 Step 200: eval CrossEntropyLoss | 0.00000122 Step 200: eval Accuracy | 1.00000000  #### The Accuracy This is from the last post. I havent' figured out how to arrange all the code yet. def compute_accuracy(preds: numpy.ndarray, y: numpy.ndarray, y_weights: numpy.ndarray) -> tuple: """Compute a batch accuracy Args: preds: a tensor of shape (dim_batch, output_dim) y: a tensor of shape (dim_batch,) with the true labels y_weights: a n.ndarray with the a weight for each example Returns: accuracy: a float between 0-1 weighted_num_correct (np.float32): Sum of the weighted correct predictions sum_weights (np.float32): Sum of the weights """ # Create an array of booleans, # True if the probability of positive sentiment is greater than # the probability of negative sentiment # else False is_pos = preds[:, 1] > preds[:, 0] # convert the array of booleans into an array of np.int32 is_pos_int = is_pos.astype(numpy.int32) # compare the array of predictions (as int32) with the target (labels) of type int32 correct = is_pos_int == y # Count the sum of the weights. sum_weights = y_weights.sum() # convert the array of correct predictions (boolean) into an arrayof np.float32 correct_float = correct.astype(numpy.float32) # Multiply each prediction with its corresponding weight. weighted_correct_float = correct_float.dot(y_weights) # Sum up the weighted correct predictions (of type np.float32), to go in the # denominator. weighted_num_correct = weighted_correct_float.sum() # Divide the number of weighted correct predictions by the sum of the # weights. accuracy = weighted_num_correct/sum_weights return accuracy, weighted_num_correct, sum_weights  ## Middle ### Testing the model on Validation Data Now we'll test our model's prediction accuracy on validation data. This program will take in a data generator and the model. • The generator allows us to get batches of data. You can use it with a for loop: for batch in iterator: # do something with that batch  batch has dimensions (X, Y, weights). • Column 0 corresponds to the tweet as a tensor (input). • Column 1 corresponds to its target (actual label, positive or negative sentiment). • Column 2 corresponds to the weights associated (example weights) • You can feed the tweet into model and it will return the predictions for the batch. # UNQ_C8 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) # GRADED FUNCTION: test_model def test_model(generator: TensorGenerator, model: trax_layers.Serial) -> float: """Calculate the accuracy of the model Args: generator: an iterator instance that provides batches of inputs and targets model: a model instance Returns: accuracy: float corresponding to the accuracy """ accuracy = 0. total_num_correct = 0 total_num_pred = 0 ### START CODE HERE (Replace instances of 'None' with your code) ### for batch in generator: # Retrieve the inputs from the batch inputs = batch[0] # Retrieve the targets (actual labels) from the batch targets = batch[1] # Retrieve the example weight. example_weight = batch[2] # Make predictions using the inputs pred = model(inputs) # Calculate accuracy for the batch by comparing its predictions and targets batch_accuracy, batch_num_correct, batch_num_pred = compute_accuracy( pred, targets, example_weight) # Update the total number of correct predictions # by adding the number of correct predictions from this batch total_num_correct += batch_num_correct # Update the total number of predictions # by adding the number of predictions made for the batch total_num_pred += batch_num_pred # Calculate accuracy over all examples accuracy = total_num_correct/total_num_pred ### END CODE HERE ### return accuracy  # DO NOT EDIT THIS CELL # testing the accuracy of your model: this takes around 20 seconds model = trainer.training_loop.eval_model # we used all the data for the training and validation (oops) # so we don't have any test data. Fix that later #accuracy = test_model(VALIDATION_GENERATOR, model) generator = valid_generator(infinite=False) accuracy = test_model(generator, model) print(f'The accuracy of your model on the validation set is {accuracy:.4f}', )  The accuracy of your model on the validation set is 0.9995  ### Testing Some Custom Input Finally, let's test some custom input. You will see that deepnets are more powerful than the older methods we have used before. Although we got close to 100% accuracy using Naive Bayes and Logistic Regression, that was because the task was way easier. This is used to predict on a new sentence. def predict(sentence: str) -> tuple: """Predicts the sentiment of the sentence Args: sentence to get the sentiment for Returns: predictions, sentiment """ inputs = numpy.array(converter.to_tensor(sentence)) # Batch size 1, add dimension for batch, to work with the model inputs = inputs.reshape(1, len(inputs)) # predict with the model probabilities = model(inputs) # Turn probabilities into categories prediction = int(probabilities[0, 1] > probabilities[0, 0]) sentiment = "positive" if prediction == 1 else "negative" return prediction, sentiment  sentence = "It's such a nice day, think i'll be taking Sid to Ramsgate fish and chips for lunch at Peter's fish factory and then the beach maybe" inputs = numpy.array(converter.to_tensor(sentence))  #### A Positive Sentence sentence = "It's such a nice day, think i'll be taking Sid to Ramsgate fish and chips for lunch at Peter's fish factory and then the beach maybe" tmp_pred, tmp_sentiment = predict(sentence) print(f"The sentiment of the sentence \n***\n\"{sentence}\"\n***\nis {tmp_sentiment}.")  The sentiment of the sentence *** "It's such a nice day, think i'll be taking Sid to Ramsgate fish and chips for lunch at Peter's fish factory and then the beach maybe" *** is positive.  #### A Negative Sentence sentence = "I hated my day, it was the worst, I'm so sad." tmp_pred, tmp_sentiment = predict(sentence) print(f"The sentiment of the sentence \n***\n\"{sentence}\"\n***\nis {tmp_sentiment}.")  The sentiment of the sentence *** "I hated my day, it was the worst, I'm so sad." *** is negative.  Notice that the model works well even for complex sentences. #### On Pooh s = "Oh, bother!" print(f"{s}: {predict(s)}")  Oh, bother!: (0, 'negative')  ### On Deep Nets Deep nets allow you to understand and capture dependencies that you would have not been able to capture with a simple linear regression, or logistic regression. • It also allows you to better use pre-trained embeddings for classification and tends to generalize better. ## End So, there you have it, a Deep Learning Model for Sentiment Analysis built using Trax. Here are the prior posts in this series. # Sentiment Analysis: Training the Model ## Table of Contents ## Training the Model In the previous post we defined our Deep Learning model for Sentiment Analysis. Now we'll turn to training it on our data. To train a model on a task, Trax defines an abstraction trax.supervised.training.TrainTask which packages the training data, loss and optimizer (among other things) together into an object. Similarly to training a model, Trax defines an abstraction trax.supervised.training.EvalTask which packages the eval data and metrics (among other things) into another object. The final piece tying things together is the trax.supervised.training.Loop abstraction that is a very simpl eand flexible way to put everything together and train the model, all the while evaluating it and saving checkpoints. Using Loop will save you a lot of code compared to always writing the training loop by hand, like you did in courses 1 and 2. More importantly, you are less likely to have a bug in that code that would ruin your training. ### Imports # from python from functools import partial from pathlib import Path import random # from pypi from trax.supervised import training import nltk import trax import trax.layers as trax_layers import trax.fastmath.numpy as numpy # this project from neurotic.nlp.twitter.tensor_generator import TensorBuilder, TensorGenerator  This next part (re-downloading the dataset) is just because I have to keep setting up new containers to get trax to work… nltk.download("twitter_samples", download_dir="/home/neurotic/data/datasets/nltk_data/")  ## Middle ### The Dataset BATCH_SIZE = 16 converter = TensorBuilder() train_generator = partial(TensorGenerator, converter, positive_data=converter.positive_training, negative_data=converter.negative_training, batch_size=BATCH_SIZE) training_generator = train_generator() valid_generator = partial(TensorGenerator, converter, positive_data=converter.positive_validation, negative_data=converter.negative_validation, batch_size=BATCH_SIZE) validation_generator = valid_generator() size_of_vocabulary = len(converter.vocabulary)  ### Here's the Model This was defined in the clast post. It seems like too much trouble not to just copy it over. def classifier(vocab_size: int=size_of_vocabulary, embedding_dim: int=256, output_dim: int=2) -> trax_layers.Serial: """Creates the classifier model Args: vocab_size: number of tokens in the training vocabulary embedding_dim: output dimension for the Embedding layer output_dim: dimension for the Dense layer Returns: the composed layer-model """ embed_layer = trax_layers.Embedding( vocab_size=vocab_size, # Size of the vocabulary d_feature=embedding_dim) # Embedding dimension mean_layer = trax_layers.Mean(axis=1) dense_output_layer = trax_layers.Dense(n_units = output_dim) log_softmax_layer = trax_layers.LogSoftmax() model = trax_layers.Serial( embed_layer, mean_layer, dense_output_layer, log_softmax_layer ) return model  Now to train the model. First define the TrainTask, EvalTask and Loop in preparation to training the model. random.seed(271) # train_generator(batch_size=batch_size, shuffle=True), train_task = training.TrainTask( labeled_data=training_generator, loss_layer=trax_layers.CrossEntropyLoss(), optimizer=trax.optimizers.Adam(0.01), n_steps_per_checkpoint=10, ) eval_task = training.EvalTask( labeled_data=validation_generator, metrics=[trax_layers.CrossEntropyLoss(), trax_layers.Accuracy()], ) model = classifier()  This defines a model trained using tl.CrossEntropyLoss optimized with the trax.optimizers.Adam optimizer, all the while tracking the accuracy using tl.Accuracy metric. We also track tl.CrossEntropyLoss on the validation set. Now let's make an output directory and train the model. output_path = Path("~/models/").expanduser() if not output_path.is_dir(): output_path.mkdir()  def train_model(classifier, train_task, eval_task, n_steps, output_dir): """Create and run the training loop Args: classifier - the model you are building train_task - Training task eval_task - Evaluation task n_steps - the evaluation steps output_dir - folder to save your files Returns: trainer - trax trainer """ training_loop = training.Loop( model=classifier, # The learning model tasks=train_task, # The training task eval_tasks = eval_task, # The evaluation task output_dir = output_dir) # The output directory training_loop.run(n_steps = n_steps) # Return the training_loop, since it has the model. return training_loop  training_loop = train_model(model, train_task, eval_task, 100, output_path)   Step 110: Ran 10 train steps in 6.06 secs Step 110: train CrossEntropyLoss | 0.00527583 Step 110: eval CrossEntropyLoss | 0.00304692 Step 110: eval Accuracy | 1.00000000 Step 120: Ran 10 train steps in 2.06 secs Step 120: train CrossEntropyLoss | 0.02130376 Step 120: eval CrossEntropyLoss | 0.00000677 Step 120: eval Accuracy | 1.00000000 Step 130: Ran 10 train steps in 0.75 secs Step 130: train CrossEntropyLoss | 0.01026674 Step 130: eval CrossEntropyLoss | 0.00424393 Step 130: eval Accuracy | 1.00000000 Step 140: Ran 10 train steps in 1.33 secs Step 140: train CrossEntropyLoss | 0.00172522 Step 140: eval CrossEntropyLoss | 0.00004072 Step 140: eval Accuracy | 1.00000000 Step 150: Ran 10 train steps in 0.77 secs Step 150: train CrossEntropyLoss | 0.00002847 Step 150: eval CrossEntropyLoss | 0.00000232 Step 150: eval Accuracy | 1.00000000 Step 160: Ran 10 train steps in 0.78 secs Step 160: train CrossEntropyLoss | 0.00002123 Step 160: eval CrossEntropyLoss | 0.00104654 Step 160: eval Accuracy | 1.00000000 Step 170: Ran 10 train steps in 0.79 secs Step 170: train CrossEntropyLoss | 0.00001706 Step 170: eval CrossEntropyLoss | 0.00000080 Step 170: eval Accuracy | 1.00000000 Step 180: Ran 10 train steps in 0.83 secs Step 180: train CrossEntropyLoss | 0.00001554 Step 180: eval CrossEntropyLoss | 0.00000989 Step 180: eval Accuracy | 1.00000000 Step 190: Ran 10 train steps in 0.85 secs Step 190: train CrossEntropyLoss | 0.00639312 Step 190: eval CrossEntropyLoss | 0.00255337 Step 190: eval Accuracy | 1.00000000 Step 200: Ran 10 train steps in 0.85 secs Step 200: train CrossEntropyLoss | 0.00124322 Step 200: eval CrossEntropyLoss | 0.02190475 Step 200: eval Accuracy | 1.00000000  ### Bundle It Up <<imports>> <<model-trainer>> <<the-model>> <<training-task>> <<eval-task>> <<training-loop>> <<fit-the-model>>  #### Imports # python from pathlib import Path # from pypi from trax.supervised import training import attr import trax import trax.layers as trax_layers  #### The Trainer @attr.s(auto_attribs=True) class SentimentNetwork: """Builds and Trains the Sentiment Analysis Model Args: training_generator: generator of training batches validation_generator: generator of validation batches vocabulary_size: number of tokens in the training vocabulary training_loops: number of times to run the training loop output_path: path to where to store the model embedding_dimension: output dimension for the Embedding layer output_dimension: dimension for the Dense layer """ vocabulary_size: int training_generator: object validation_generator: object training_loops: int output_path: Path embedding_dimension: int=256 output_dimension: int=2 _model: trax_layers.Serial=None _training_task: training.TrainTask=None _evaluation_task: training.EvalTask=None _training_loop: training.Loop=None  • The Model @property def model(self) -> trax_layers.Serial: """The Embeddings model""" if self._model is None: self._model = trax_layers.Serial( trax_layers.Embedding( vocab_size=self.vocabulary_size, d_feature=self.embedding_dimension), trax_layers.Mean(axis=1), trax_layers.Dense(n_units=self.output_dimension), trax_layers.LogSoftmax(), ) return self._model  • The Training Task @property def training_task(self) -> training.TrainTask: """The training task for training the model""" if self._training_task is None: self._training_task = training.TrainTask( labeled_data=self.training_generator, loss_layer=trax_layers.CrossEntropyLoss(), optimizer=trax.optimizers.Adam(0.01), n_steps_per_checkpoint=10, ) return self._training_task  • Evaluation Task @property def evaluation_task(self) -> training.EvalTask: """The validation evaluation task""" if self._evaluation_task is None: self._evaluation_task = training.EvalTask( labeled_data=self.validation_generator, metrics=[trax_layers.CrossEntropyLoss(), trax_layers.Accuracy()], ) return self._evaluation_task  • Training Loop @property def training_loop(self) -> training.Loop: """The thing to run the training""" if self._training_loop is None: self._training_loop = training.Loop( model=self.model, tasks=self.training_task, eval_tasks=self.evaluation_task, output_dir= self.output_path) return self._training_loop  • Fitting the Model def fit(self): """Runs the training loop""" self.training_loop.run(n_steps=self.training_loops) return  ### Practice In Making Predictions Now that you have trained a model, you can access it as training_loop.model object. We will actually use training_loop.eval_model and in the next weeks you will learn why we sometimes use a different model for evaluation, e.g., one without dropout. For now, make predictions with your model. Use the training data just to see how the prediction process works. • Later, you will use validation data to evaluate your model's performance. Create a generator object. tmp_train_generator = train_generator(batch_size=16)  Get one batch. tmp_batch = next(tmp_train_generator)  Position 0 has the model inputs (tweets as tensors). Position 1 has the targets (the actual labels). tmp_inputs, tmp_targets, tmp_example_weights = tmp_batch print(f"The batch is a tuple of length {len(tmp_batch)} because position 0 contains the tweets, and position 1 contains the targets.") print(f"The shape of the tweet tensors is {tmp_inputs.shape} (num of examples, length of tweet tensors)") print(f"The shape of the labels is {tmp_targets.shape}, which is the batch size.") print(f"The shape of the example_weights is {tmp_example_weights.shape}, which is the same as inputs/targets size.")  The batch is a tuple of length 3 because position 0 contains the tweets, and position 1 contains the targets. The shape of the tweet tensors is (16, 14) (num of examples, length of tweet tensors) The shape of the labels is (16,), which is the batch size. The shape of the example_weights is (16,), which is the same as inputs/targets size.  Feed the tweet tensors into the model to get a prediction. tmp_pred = training_loop.eval_model(tmp_inputs) print(f"The prediction shape is {tmp_pred.shape}, num of tensor_tweets as rows") print("Column 0 is the probability of a negative sentiment (class 0)") print("Column 1 is the probability of a positive sentiment (class 1)") print() print("View the prediction array") print(tmp_pred)  The prediction shape is (16, 2), num of tensor_tweets as rows Column 0 is the probability of a negative sentiment (class 0) Column 1 is the probability of a positive sentiment (class 1) View the prediction array [[-1.2960873e+01 -2.3841858e-06] [-5.6474457e+00 -3.5326481e-03] [-5.3460855e+00 -4.7781467e-03] [-7.6736917e+00 -4.6515465e-04] [-5.2682662e+00 -5.1658154e-03] [-1.0566207e+01 -2.5749207e-05] [-5.6388092e+00 -3.5634041e-03] [-3.9540453e+00 -1.9363165e-02] [ 0.0000000e+00 -2.0700916e+01] [ 0.0000000e+00 -2.2949795e+01] [ 0.0000000e+00 -2.3168846e+01] [ 0.0000000e+00 -2.4553205e+01] [-9.5367432e-07 -1.3878939e+01] [ 0.0000000e+00 -1.6655178e+01] [ 0.0000000e+00 -1.5975946e+01] [ 0.0000000e+00 -2.0577690e+01]]  To turn these probabilities into categories (negative or positive sentiment prediction), for each row: • Compare the probabilities in each column. • If column 1 has a value greater than column 0, classify that as a positive tweet. • Otherwise if column 1 is less than or equal to column 0, classify that example as a negative tweet. Turn probabilites into category predictions. tmp_is_positive = tmp_pred[:,1] > tmp_pred[:,0] for i, p in enumerate(tmp_is_positive): print(f"Neg log prob {tmp_pred[i,0]:.4f}\tPos log prob {tmp_pred[i,1]:.4f}\t is positive? {p}\t actual {tmp_targets[i]}")  Neg log prob -12.9609 Pos log prob -0.0000 is positive? True actual 1 Neg log prob -5.6474 Pos log prob -0.0035 is positive? True actual 1 Neg log prob -5.3461 Pos log prob -0.0048 is positive? True actual 1 Neg log prob -7.6737 Pos log prob -0.0005 is positive? True actual 1 Neg log prob -5.2683 Pos log prob -0.0052 is positive? True actual 1 Neg log prob -10.5662 Pos log prob -0.0000 is positive? True actual 1 Neg log prob -5.6388 Pos log prob -0.0036 is positive? True actual 1 Neg log prob -3.9540 Pos log prob -0.0194 is positive? True actual 1 Neg log prob 0.0000 Pos log prob -20.7009 is positive? False actual 0 Neg log prob 0.0000 Pos log prob -22.9498 is positive? False actual 0 Neg log prob 0.0000 Pos log prob -23.1688 is positive? False actual 0 Neg log prob 0.0000 Pos log prob -24.5532 is positive? False actual 0 Neg log prob -0.0000 Pos log prob -13.8789 is positive? False actual 0 Neg log prob 0.0000 Pos log prob -16.6552 is positive? False actual 0 Neg log prob 0.0000 Pos log prob -15.9759 is positive? False actual 0 Neg log prob 0.0000 Pos log prob -20.5777 is positive? False actual 0  Notice that since you are making a prediction using a training batch, it's more likely that the model's predictions match the actual targets (labels). • Every prediction that the tweet is positive is also matching the actual target of 1 (positive sentiment). • Similarly, all predictions that the sentiment is not positive matches the actual target of 0 (negative sentiment) One more useful thing to know is how to compare if the prediction is matching the actual target (label). • The result of calculation is_positive is a boolean. • The target is a type trax.fastmath.numpy.int32 • If you expect to be doing division, you may prefer to work with decimal numbers with the data type type trax.fastmath.numpy.int32 View the array of booleans. print("Array of booleans") display(tmp_is_positive)  Array of booleans DeviceArray([ True, True, True, True, True, True, True, True, False, False, False, False, False, False, False, False], dtype=bool)  Convert booleans to type int32. • True is converted to 1 • False is converted to 0 tmp_is_positive_int = tmp_is_positive.astype(trax.fastmath.numpy.int32)  View the array of integers. print("Array of integers") display(tmp_is_positive_int)  Array of integers DeviceArray([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)  Convert boolean to type float32. tmp_is_positive_float = tmp_is_positive.astype(numpy.float32)  View the array of floats. print("Array of floats") display(tmp_is_positive_float)  Array of floats DeviceArray([1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)  print(tmp_pred.shape)  (16, 2)  Note that Python usually does type conversion for you when you compare a boolean to an integer. • True compared to 1 is True, otherwise any other integer is False. • False compared to 0 is True, otherwise any ohter integer is False. print(f"True == 1: {True == 1}") print(f"True == 2: {True == 2}") print(f"False == 0: {False == 0}") print(f"False == 2: {False == 2}")  True == 1: True True == 2: False False == 0: True False == 2: False  However, we recommend that you keep track of the data type of your variables to avoid unexpected outcomes. So it helps to convert the booleans into integers. #### Compare 1 to 1 rather than comparing True to 1. Hopefully you are now familiar with what kinds of inputs and outputs the model uses when making a prediction. • This will help you implement a function that estimates the accuracy of the model's predictions. ### Evaluation #### 5.1 Computing the accuracy of a batch You will now write a function that evaluates your model on the validation set and returns the accuracy. • preds contains the predictions. • Its dimensions are (batch_size, output_dim). output_dim is two in this case. Column 0 contains the probability that the tweet belongs to class 0 (negative sentiment). Column 1 contains probability that it belongs to class 1 (positive sentiment). • If the probability in column 1 is greater than the probability in column 0, then interpret this as the model's prediction that the example has label 1 (positive sentiment). • Otherwise, if the probabilities are equal or the probability in column 0 is higher, the model's prediction is 0 (negative sentiment). • y contains the actual labels. • y_weights contains the weights to give to predictions. def compute_accuracy(preds: numpy.ndarray, y: numpy.ndarray, y_weights: numpy.ndarray) -> tuple: """Compute a batch accuracy Args: preds: a tensor of shape (dim_batch, output_dim) y: a tensor of shape (dim_batch,) with the true labels y_weights: a n.ndarray with the a weight for each example Returns: accuracy: a float between 0-1 weighted_num_correct (np.float32): Sum of the weighted correct predictions sum_weights (np.float32): Sum of the weights """ # Create an array of booleans, # True if the probability of positive sentiment is greater than # the probability of negative sentiment # else False is_pos = preds[:, 1] > preds[:, 0] # convert the array of booleans into an array of np.int32 is_pos_int = is_pos.astype(numpy.int32) # compare the array of predictions (as int32) with the target (labels) of type int32 correct = is_pos_int == y # Count the sum of the weights. sum_weights = y_weights.sum() # convert the array of correct predictions (boolean) into an arrayof np.float32 correct_float = correct.astype(numpy.float32) # Multiply each prediction with its corresponding weight. weighted_correct_float = correct_float.dot(y_weights) # Sum up the weighted correct predictions (of type np.float32), to go in the # denominator. weighted_num_correct = weighted_correct_float.sum() # Divide the number of weighted correct predictions by the sum of the # weights. accuracy = weighted_num_correct/sum_weights return accuracy, weighted_num_correct, sum_weights  Get one batch. tmp_val_generator = valid_generator(batch_size=64) tmp_batch = next(tmp_val_generator)  Position 0 has the model inputs (tweets as tensors) position 1 has the targets (the actual labels) tmp_inputs, tmp_targets, tmp_example_weights = tmp_batch  Feed the tweet tensors into the model to get a prediction. tmp_pred = training_loop.eval_model(tmp_inputs)  tmp_acc, tmp_num_correct, tmp_num_predictions = compute_accuracy(preds=tmp_pred, y=tmp_targets, y_weights=tmp_example_weights) print(f"Model's prediction accuracy on a single training batch is: {100 * tmp_acc}%") print(f"Weighted number of correct predictions {tmp_num_correct}; weighted number of total observations predicted {tmp_num_predictions}")  Model's prediction accuracy on a single training batch is: 100.0% Weighted number of correct predictions 64.0; weighted number of total observations predicted 64  ## End Now that we have a trained model, in the next post we'll test how well it did. # Sentiment Analysis: Defining the Model ## Table of Contents ## Beginning This continues a series on sentiment analysis with deep learning. In the previous post we loaded and processed our data set. In this post we'll see about actually defining the Neural Network. In this part we will write your own library of layers. It will be very similar to the one used in Trax and also in Keras and PyTorch. The intention is that in writing our own small framework will help us understand how they all work and use them more effectively in the future. ### Imports # from pypi from expects import be_true, expect from trax import fastmath import attr import numpy import trax import trax.layers as trax_layers # this project from neurotic.nlp.twitter.tensor_generator import TensorBuilder  ### Set Up Some aliases to get closer to what the notebook has. numpy_fastmath = fastmath.numpy random = fastmath.random  ## Middle ### The Base Layer Class This will be the base class that the others will inherit from. @attr.s(auto_attribs=True) class Layer: """Base class for layers """ def forward(self, x: numpy.ndarray): """The forward propagation method Raises: NotImplementedError - method is called but child hasn't implemented it """ raise NotImplementedError def init_weights_and_state(self, input_signature, random_key): """method to initialize the weights based on the input signature and random key, be implemented by subclasses of this Layer class """ raise NotImplementedError def init(self, input_signature, random_key) -> numpy.ndarray: """initializes and returns the weights Note: This is just an alias for the init_weights_and_state method for some reason Args: input_signature: who knows? random_key: once again, who knows? Returns: the weights """ self.init_weights_and_state(input_signature, random_key) return self.weights def __call__(self, x) -> numpy.ndarray: """This is an alias for the forward method Args: x: input array Returns: whatever the forward method does """ return self.forward(x)  ### The ReLU class Here's the ReLU function: $\mathrm{ReLU}(x) = \mathrm{max}(0,x)$ We'll implement the ReLU activation function below. The function will take in a matrix or vector and it transform all the negative numbers into 0 while keeping all the positive numbers intact. Please use numpy.maximum(A,k) to find the maximum between each element in A and a scalar k. class Relu(Layer): """Relu activation function implementation""" def forward(self, x: numpy.ndarray) -> numpy.ndarray: """"Performs the activation Args: - x: the input Returns: - activation: all positive or 0 version of x """ return numpy.maximum(x, 0)  #### Test It x = numpy.array([[-2.0, -1.0, 0.0], [0.0, 1.0, 2.0]], dtype=float) relu_layer = Relu() print("Test data is:") print(x) print("\nOutput of Relu is:") actual = relu_layer(x) print(actual) expected = numpy.array([[0., 0., 0.], [0., 1., 2.]]) expect(numpy.allclose(actual, expected)).to(be_true)  Test data is: [[-2. -1. 0.] [ 0. 1. 2.]] Output of Relu is: [[0. 0. 0.] [0. 1. 2.]]  ### The Dense class Implement the forward function of the Dense class. • The forward function multiplies the input to the layer (x) by the weight matrix (W). $\mathrm{forward}(\mathbf{x},\mathbf{W}) = \mathbf{xW}$ • You can use numpy.dot to perform the matrix multiplication. Note that for more efficient code execution, you will use the trax version of math, which includes a trax version of numpy and also random. Implement the weight initializer new_weights function • Weights are initialized with a random key. • The second parameter is a tuple for the desired shape of the weights (num_rows, num_cols) • The num of rows for weights should equal the number of columns in x, because for forward propagation, you will multiply x times weights. Please use trax.fastmath.random.normal(key, shape, dtype=tf.float32) to generate random values for the weight matrix. The key difference between this function and the standard numpy randomness is the explicit use of random keys, which need to be passed in. While it can look tedious at the first sight to pass the random key everywhere, you will learn in Course 4 why this is very helpful when implementing some advanced models. • key can be generated by calling random.get_prng(seed) and passing in a number for the seed. • shape is a tuple with the desired shape of the weight matrix. • The number of rows in the weight matrix should equal the number of columns in the variable x. Since x may have 2 dimensions if it represents a single training example (row, col), or three dimensions (batch_size, row, col), get the last dimension from the tuple that holds the dimensions of x. • The number of columns in the weight matrix is the number of units chosen for that dense layer. Look at the __init__ function to see which variable stores the number of units. • dtype is the data type of the values in the generated matrix; keep the default of tf.float32. In this case, don't explicitly set the dtype (just let it use the default value). Set the standard deviation of the random values to 0.1 • The values generated have a mean of 0 and standard deviation of 1. • Set the default standard deviation stdev to be 0.1 by multiplying the standard deviation to each of the values in the weight matrix. See how the fastmath.trax.random.normal function works. tmp_key = random.get_prng(seed=1) print("The random seed generated by random.get_prng") display(tmp_key)  WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) The random seed generated by random.get_prng DeviceArray([0, 1], dtype=uint32)  For some reason tensorflow can't find the GPU. Setting the log level to 0 like the message suggests shows that it gives up after trying to find a TPU, there's no indication that it's looking for the GPU. import tensorflow print(tensorflow.test.gpu_device_name())  Hmmm. I'll have to troubleshoot that. print("choose a matrix with 2 rows and 3 columns") tmp_shape=(2,3) print(tmp_shape)  choose a matrix with 2 rows and 3 columns (2, 3)  Generate a weight matrix Note that you'll get an error if you try to set dtype to tf.float32, where tf is tensorflow Just avoid setting the dtype and allow it to use the default data type tmp_weight = random.normal(key=tmp_key, shape=tmp_shape) print("Weight matrix generated with a normal distribution with mean 0 and stdev of 1") display(tmp_weight)  Weight matrix generated with a normal distribution with mean 0 and stdev of 1 DeviceArray([[ 0.957307 , -0.9699291 , 1.0070664 ], [ 0.36619022, 0.17294823, 0.29092228]], dtype=float32)  @attr.s(auto_attribs=True) class Dense(Layer): """ A dense (fully-connected) layer. Args: - n_units: the number of columns for our weight matrix - init_stdev: standard deviation for our initial weights """ n_units: int init_stdev: float=0.1 def forward(self, x: numpy.ndarray) -> numpy.ndarray: """The dot product of the input and the weights Args: x: input to multipyl Returns: product of x and weights """ return numpy.dot(x, self.weights) def init_weights_and_state(self, input_signature: tuple, random_key: int) -> numpy.ndarray: """initializes the weights Args: input_signature: tuple whose final dimension will be the number of rows random_ke: something to start the random normal generator with """ input_shape = input_signature.shape # to allow for more than two-dimensional matrices, # we use the last column of the input shape, rather than assuming it's # column 1 self.weights = (random.normal(key=random_key, shape=(input_shape[-1], self.n_units)) * self.init_stdev) return self.weights  dense_layer = Dense(n_units=10) #sets number of units in dense layer random_key = random.get_prng(seed=0) # sets random seed z = numpy.array([[2.0, 7.0, 25.0]]) # input array dense_layer.init(z, random_key) print("Weights are\n ",dense_layer.weights) #Returns randomly generated weights output = dense_layer(z) print("Foward function output is ", output) # Returns multiplied values of units and weights expected_weights = numpy.array([ [-0.02837108, 0.09368162, -0.10050076, 0.14165013, 0.10543301, 0.09108126, -0.04265672, 0.0986188, -0.05575325, 0.00153249], [-0.20785688, 0.0554837, 0.09142365, 0.05744595, 0.07227863, 0.01210617, -0.03237354, 0.16234995, 0.02450038, -0.13809784], [-0.06111237, 0.01403724, 0.08410042, -0.1094358, -0.10775021, -0.11396459, -0.05933381, -0.01557652, -0.03832145, -0.11144515]]) expected_output = numpy.array( [[-3.0395496, 0.9266802, 2.5414743, -2.050473, -1.9769388, -2.582209, -1.7952735, 0.94427425, -0.8980402, -3.7497487]]) expect(numpy.allclose(dense_layer.weights, expected_weights)).to(be_true) expect(numpy.allclose(output, expected_output)).to(be_true)  Weights are [[-0.02837108 0.09368162 -0.10050076 0.14165013 0.10543301 0.09108126 -0.04265672 0.0986188 -0.05575325 0.00153249] [-0.20785688 0.0554837 0.09142365 0.05744595 0.07227863 0.01210617 -0.03237354 0.16234995 0.02450038 -0.13809784] [-0.06111237 0.01403724 0.08410042 -0.1094358 -0.10775021 -0.11396459 -0.05933381 -0.01557652 -0.03832145 -0.11144515]] Foward function output is [[-3.03954965 0.92668021 2.54147445 -2.05047299 -1.97693891 -2.58220917 -1.79527355 0.94427423 -0.89804017 -3.74974866]]  ### The Layers for the Trax-Based Model For the model implementation we will use the Trax layers library. Trax layers are very similar to the ones we implemented above, but in addition to trainable weights they also have a non-trainable state. This state is used in layers like batch normalization and for inference - we will learn more about it later on. ### Dense First, look at the code of the Trax Dense layer and compare to the implementation above. Another other important layer that we will use a lot is the Serial layer which allows us to execute one layer after another in sequence. • You can pass in the layers as arguments to Serial, separated by commas. • For example: tl.Serial(tl.Embeddings(...), tl.Mean(...), tl.Dense(...), tl.LogSoftmax(...)) The layer classes have pretty good docstrings, unlike the fastmath stuff, so it might be useful to look at it - but it's too long to include here. We're also going to use an Embedding • tl.Embedding(vocab_size, d_feature). • vocab_size is the number of unique words in the given vocabulary. • d_feature is the number of elements in the word embedding (some choices for a word embedding size range from 150 to 300, for example). tmp_embed = trax_layers.Embedding(vocab_size=3, d_feature=2) display(tmp_embed)  Embedding_3_2  Another useful layer is the Mean which calculates means across an axis. In this case, use axis = 1 (across rows) to get an average embedding vector (an embedding vector that is an average of all words in the vocabulary). • For example, if the embedding matrix is 300 elements and vocab size is 10,000 words, taking the mean of the embedding matrix along axis=1 will yield a vector of 300 elements. Pretend the embedding matrix uses 2 elements for embedding the meaning of a word and has a vocabulary size of 3, so it has shape (2,3). tmp_embed = numpy.array([[1,2,3,], [4,5,6] ])  First take the mean along axis 0, which creates a vector whose length equals the vocabulary size (the number of columns). display(numpy.mean(tmp_embed,axis=0))  array([2.5, 3.5, 4.5])  If you take the mean along axis 1 it creates a vector whose length equals the number of elements in a word embedding (the rows). display(numpy.mean(tmp_embed,axis=1))  array([2., 5.])  Finally, a LogSoftmax layer gives you a log-softmax output. #### Online Documentation For completeness, here's some links to the Read the Docs documentation for these layers. ### The Classifier Function builder = TensorBuilder() size_of_vocabulary = len(builder.vocabulary)  def classifier(vocab_size: int=size_of_vocabulary, embedding_dim: int=256, output_dim: int=2) -> trax_layers.Serial: """Creates the classifier model Args: vocab_size: number of tokens in the training vocabulary embedding_dim: output dimension for the Embedding layer output_dim: dimension for the Dense layer Returns: the composed layer-model """ embed_layer = trax_layers.Embedding( vocab_size=vocab_size, # Size of the vocabulary d_feature=embedding_dim) # Embedding dimension mean_layer = trax_layers.Mean(axis=1) dense_output_layer = trax_layers.Dense(n_units = output_dim) log_softmax_layer = trax_layers.LogSoftmax() model = trax_layers.Serial( embed_layer, mean_layer, dense_output_layer, log_softmax_layer ) return model  tmp_model = classifier()  print(type(tmp_model)) display(tmp_model)  <class 'trax.layers.combinators.Serial'> Serial[ Embedding_9164_256 Mean Dense_2 LogSoftmax ]  ## Ending Now that we have our Deep Learning model, we'll move on to training it. # Sentiment Analysis: Pre-processing the Data ## Beginning This is the next in a series about building a Deep Learning model for sentiment analysis. The first post was this one. #### Imports # from python from argparse import Namespace import random # from pypi from expects import contain_exactly, equal, expect from nltk.corpus import twitter_samples import nltk import numpy # this project from neurotic.nlp.twitter.processor import TwitterProcessor  ### Set Up The NLTK data has to be downloaded at least once. nltk.download("twitter_samples", download_dir="~/data/datasets/nltk_data/")  ## Middle ### The NLTK Data positive = twitter_samples.strings('positive_tweets.json') negative = twitter_samples.strings('negative_tweets.json') print(f"Positive Tweets: {len(positive):,}") print(f"Negative Tweets: {len(negative):,}")  Positive Tweets: 5,000 Negative Tweets: 5,000  ### Split It Up Instead of randomly splitting the data we're going to do a straight slice. SPLIT = 4000  #### Split positive set into validation and training positive_validation = positive[SPLIT:] positive_training = positive[:SPLIT]  #### Split negative set into validation and training negative_validation = negative[SPLIT:] negative_training = negative[:SPLIT]  #### Combine the Data Sets The X data. train_x = positive_training + negative_training validation_x = positive_validation + negative_validation  The labels (1 for positive, 0 for negative). train_y = numpy.append(numpy.ones(len(positive_training)), numpy.zeros(len(negative_training))) validation_y = numpy.append(numpy.ones(len(positive_validation)), numpy.zeros(len(negative_validation))) print(f"length of train_x {len(train_x):,}") print(f"length of validation_x {len(validation_x):,}")  length of train_x 8,000 length of validation_x 2,000  ### Building the vocabulary Now build the vocabulary. • Map each word in each tweet to an integer (an "index"). • The following code does this for you, but please read it and understand what it's doing. • Note that you will build the vocabulary based on the training data. • To do so, you will assign an index to everyword by iterating over your training set. The vocabulary will also include some special tokens • __PAD__: padding • </e>: end of line • __UNK__: a token representing any word that is not in the vocabulary. Tokens = Namespace(padding="__PAD__", ending="__</e>__", unknown="__UNK__") process = TwitterProcessor() vocabulary = {Tokens.padding: 0, Tokens.ending: 1, Tokens.unknown: 2} for tweet in train_x: for token in process(tweet): if token not in vocabulary: vocabulary[token] = len(vocabulary)  print(f"Words in the vocabulary: {len(vocabulary):,}") count = 0 for token in vocabulary: print(f"{count}: {token}: {vocabulary[token]}") count += 1 if count == 5: break  Words in the vocabulary: 9,164 0: __PAD__: 0 1: __</e>__: 1 2: __UNK__: 2 3: followfriday: 3 4: top: 4  ### Converting a tweet to a tensor Now we'll write a function that will convert each tweet to a tensor (a list of unique integer IDs representing the processed tweet). • Note, the returned data type will be a regular Python list() • You won't use TensorFlow in this function • You also won't use a numpy array • You also won't use trax.fastmath.numpy array • For words in the tweet that are not in the vocabulary, set them to the unique ID for the token __UNK__. For example, given this string: '@happypuppy, is Maria happy?'  You first tokenize it. ['maria', 'happi']  Then convert each word into the index for it. [2, 56]  Notice that the word "maria" is not in the vocabulary, so it is assigned the unique integer associated with the __UNK__ token, because it is considered "unknown." # UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) # GRADED FUNCTION: tweet_to_tensor def tweet_to_tensor(tweet: str, vocab_dict: dict, unk_token: str='__UNK__', verbose: bool=False): """Convert a tweet to a list of indices Args: tweet - A string containing a tweet vocab_dict - The words dictionary unk_token - The special string for unknown tokens verbose - Print info during runtime Returns: tensor_l - A python list with indices for the tweet tokens """ ### START CODE HERE (Replace instances of 'None' with your code) ### # Process the tweet into a list of words # where only important words are kept (stop words removed) word_l = processor(tweet) if verbose: print("List of words from the processed tweet:") print(word_l) # Initialize the list that will contain the unique integer IDs of each word tensor_l = [] # Get the unique integer ID of the __UNK__ token unk_ID = vocab_dict[unk_token] if verbose: print(f"The unique integer ID for the unk_token is {unk_ID}") # for each word in the list: for word in word_l: # Get the unique integer ID. # If the word doesn't exist in the vocab dictionary, # use the unique ID for __UNK__ instead. word_ID = vocab_dict.get(word, unk_ID) ### END CODE HERE ### # Append the unique integer ID to the tensor list. tensor_l.append(word_ID) return tensor_l  print("Actual tweet is\n", positive_validation[0]) print("\nTensor of tweet:\n", tweet_to_tensor(positive_validation[0], vocab_dict=vocabulary))  Actual tweet is Bro:U wan cut hair anot,ur hair long Liao bo Me:since ord liao,take it easy lor treat as save$ leave it longer :)
Bro:LOL Sibei xialan

Tensor of tweet:
[1072, 96, 484, 2376, 750, 8220, 1132, 750, 53, 2, 2701, 796, 2, 2, 354, 606, 2, 3523, 1025, 602, 4599, 9, 1072, 158, 2, 2]

def test_tweet_to_tensor():
test_cases = [

{
"name":"simple_test_check",
"input": [positive_validation[1], vocabulary],
"expected":[444, 2, 304, 567, 56, 9],
"error":"The function gives bad output for val_pos[1]. Test failed"
},
{
"name":"datatype_check",
"input":[positive_validation[1], vocabulary],
"expected":type([]),
"error":"Datatype mismatch. Need only list not np.array"
},
{
"name":"without_unk_check",
"input":[positive_validation[1], vocabulary],
"expected":6,
"error":"Unk word check not done- Please check if you included mapping for unknown word"
}
]
count = 0
for test_case in test_cases:
try:
if test_case['name'] == "simple_test_check":
assert test_case["expected"] == tweet_to_tensor(*test_case['input'])
count += 1
if test_case['name'] == "datatype_check":
assert isinstance(tweet_to_tensor(*test_case['input']), test_case["expected"])
count += 1
if test_case['name'] == "without_unk_check":
assert None not in tweet_to_tensor(*test_case['input'])
count += 1

except:
print(test_case['error'])
if count == 3:
print("\033[92m All tests passed")
else:
print(count," Tests passed out of 3")
test_tweet_to_tensor()

The function gives bad output for val_pos[1]. Test failed
2  Tests passed out of 3


Their tweet processor wipes out everything after the start of a URL, even if it isn't part of the URL, so they have fewer tokens, so the indices won't match exactly.

### Creating a batch generator

Most of the time in Natural Language Processing, and AI in general we use batches when training our data sets.

• If instead of training with batches of examples, you were to train a model with one example at a time, it would take a very long time to train the model.
• You will now build a data generator that takes in the positive/negative tweets and returns a batch of training examples. It returns the model inputs, the targets (positive or negative labels) and the weight for each target (ex: this allows us to treat some examples as more important to get right than others, but commonly this will all be 1.0).

Once you create the generator, you could include it in a for loop:

for batch_inputs, batch_targets, batch_example_weights in data_generator:


You can also get a single batch like this:

batch_inputs, batch_targets, batch_example_weights = next(data_generator)


The generator returns the next batch each time it's called.

• This generator returns the data in a format (tensors) that you could directly use in your model.
• It returns a triple: the inputs, targets, and loss weights:

– Inputs is a tensor that contains the batch of tweets we put into the model. – Targets is the corresponding batch of labels that we train to generate. – Loss weights here are just 1s with same shape as targets. Next week, you will use it to mask input padding.

#### data_generator

A batch of spaghetti.

# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def data_generator(data_pos: list, data_neg: list, batch_size: int,
loop: bool, vocab_dict: dict, shuffle: bool=False):
"""Generates batches of data

Args:
data_pos - Set of positive examples
data_neg - Set of negative examples
batch_size - number of samples per batch. Must be even
loop - True or False
vocab_dict - The words dictionary
shuffle - Shuffle the data order

Yield:
inputs - Subset of positive and negative examples
targets - The corresponding labels for the subset
example_weights - An array specifying the importance of each example
"""
### START GIVEN CODE ###
# make sure the batch size is an even number
# to allow an equal number of positive and negative samples
assert batch_size % 2 == 0

# Number of positive examples in each batch is half of the batch size
# same with number of negative examples in each batch
n_to_take = batch_size // 2

# Use pos_index to walk through the data_pos array
# same with neg_index and data_neg
pos_index = 0
neg_index = 0

len_data_pos = len(data_pos)
len_data_neg = len(data_neg)

# Get and array with the data indexes
pos_index_lines = list(range(len_data_pos))
neg_index_lines = list(range(len_data_neg))

# shuffle lines if shuffle is set to True
if shuffle:
rnd.shuffle(pos_index_lines)
rnd.shuffle(neg_index_lines)

stop = False

# Loop indefinitely
while not stop:

# create a batch with positive and negative examples
batch = []

# First part: Pack n_to_take positive examples

# Start from pos_index and increment i up to n_to_take
for i in range(n_to_take):

# If the positive index goes past the positive dataset length,
if pos_index >= len_data_pos:

# If loop is set to False, break once we reach the end of the dataset
if not loop:
stop = True;
break;

# If user wants to keep re-using the data, reset the index
pos_index = 0

if shuffle:
# Shuffle the index of the positive sample
rnd.shuffle(pos_index_lines)

# get the tweet as pos_index
tweet = data_pos[pos_index_lines[pos_index]]

# convert the tweet into tensors of integers representing the processed words
tensor = tweet_to_tensor(tweet, vocab_dict)

# append the tensor to the batch list
batch.append(tensor)

# Increment pos_index by one
pos_index = pos_index + 1

### END GIVEN CODE ###

### START CODE HERE (Replace instances of 'None' with your code) ###

# Second part: Pack n_to_take negative examples

# Using the same batch list, start from neg_index and increment i up to n_to_take
for i in range(neg_index, n_to_take):

# If the negative index goes past the negative dataset length,
if neg_index > len_data_neg:

# If loop is set to False, break once we reach the end of the dataset
if not loop:
stop = True;
break;

# If user wants to keep re-using the data, reset the index
neg_index = 0

if shuffle:
# Shuffle the index of the negative sample
rnd.shuffle(neg_index_lines)
# get the tweet at neg_index
tweet = data_neg[neg_index_lines[neg_index]]

# convert the tweet into tensors of integers representing the processed words
tensor = tweet_to_tensor(tweet, vocab_dict)

# append the tensor to the batch list
batch.append(tensor)

# Increment neg_index by one
neg_index += 1

### END CODE HERE ###

### START GIVEN CODE ###
if stop:
break;

# Update the start index for positive data
# so that it's n_to_take positions after the current pos_index
pos_index += n_to_take

# Update the start index for negative data
# so that it's n_to_take positions after the current neg_index
neg_index += n_to_take

# Get the max tweet length (the length of the longest tweet)
# (you will pad all shorter tweets to have this length)
max_len = max([len(t) for t in batch])

# Initialize the input_l, which will
# store the padded versions of the tensors
# Pad shorter tweets with zeros
for tensor in batch:
### END GIVEN CODE ###

### START CODE HERE (Replace instances of 'None' with your code) ###
# Get the number of positions to pad for this tensor so that it will be max_len long

# Generate a list of zeros, with length n_pad

# concatenate the tensor and the list of padded zeros

# convert the list of padded tensors to a numpy array
# and store this as the model inputs

# Generate the list of targets for the positive examples (a list of ones)
# The length is the number of positive examples in the batch
target_pos = [1] * len(batch[:n_to_take])

# Generate the list of targets for the negative examples (a list of zeros)
# The length is the number of negative examples in the batch
target_neg = [0] * len(batch[n_to_take:])

# Concatenate the positve and negative targets
target_l = target_pos + target_neg

# Convert the target list into a numpy array
targets = numpy.array(target_l)

# Example weights: Treat all examples equally importantly.It should return an np.array. Hint: Use np.ones_like()
example_weights = numpy.ones_like(targets)

### END CODE HERE ###

### GIVEN CODE ###
# note we use yield and not return
yield inputs, targets, example_weights


Now you can use your data generator to create a data generator for the training data, and another data generator for the validation data.

We will create a third data generator that does not loop, for testing the final accuracy of the model.

# Set the random number generator for the shuffle procedure
rnd = random
rnd.seed(30)

# Create the training data generator
def train_generator(batch_size, shuffle = False):
return data_generator(positive_training, negative_training,
batch_size, True, vocabulary, shuffle)

# Create the validation data generator
def val_generator(batch_size, shuffle = False):
return data_generator(positive_validation, negative_validation,
batch_size, True, vocabulary, shuffle)

# Create the validation data generator
def test_generator(batch_size, shuffle = False):
return data_generator(positive_validation, negative_validation, batch_size,
False, vocabulary, shuffle)

# Get a batch from the train_generator and inspect.
inputs, targets, example_weights = next(train_generator(4, shuffle=True))

# this will print a list of 4 tensors padded with zeros
print(f'Inputs: {inputs}')
print(f'Targets: {targets}')
print(f'Example Weights: {example_weights}')

Inputs: [[2030 4492 3231    9    0    0    0    0    0    0    0]
[5009  571 2025 1475 5233 3532  142 3532  132  464    9]
[3798  111   96  587 2960 4007    0    0    0    0    0]
[ 256 3798    0    0    0    0    0    0    0    0    0]]
Targets: [1 1 0 0]
Example Weights: [1 1 1 1]


#### Test the train_generator

Create a data generator for training data which produces batches of size 4 (for tensors and their respective targets).

tmp_data_gen = train_generator(batch_size = 4)


Call the data generator to get one batch and its targets.

tmp_inputs, tmp_targets, tmp_example_weights = next(tmp_data_gen)

print(f"The inputs shape is {tmp_inputs.shape}")
print(f"The targets shape is {tmp_targets.shape}")
print(f"The example weights shape is {tmp_example_weights.shape}")

for i,t in enumerate(tmp_inputs):
print(f"input tensor: {t}; target {tmp_targets[i]}; example weights {tmp_example_weights[i]}")

The inputs shape is (4, 14)
The targets shape is (4,)
The example weights shape is (4,)
input tensor: [3 4 5 6 7 8 9 0 0 0 0 0 0 0]; target 1; example weights 1
input tensor: [10 11 12 13 14 15 16 17 18 19 20  9 21 22]; target 1; example weights 1
input tensor: [5807 2931 3798    0    0    0    0    0    0    0    0    0    0    0]; target 0; example weights 1
input tensor: [ 865  261 3689 5808  313 4499  571 1248 2795  333 1220 3798    0    0]; target 0; example weights 1


### Bundle It Up

<<imports>>

<<defaults>>

<<nltk-settings>>

<<special-tokens>>

<<the-builder>>

<<positive-tweets>>

<<negative-tweets>>

<<positive-training>>

<<negative-training>>

<<positive-validation>>

<<negative-validation>>

<<the-vocabulary>>

<<x-train>>

<<to-tensor>>

<<the-generator>>

<<positive-indices>>

<<negative-indices>>

<<positives>>

<<negatives>>

<<positive-generator>>

<<negative-generator>>

<<the-iterator>>

<<the-next>>


#### Imports

# python
from argparse import Namespace
from itertools import cycle

import random

# pypi

import attr
import numpy

# this project


#### Defaults

Defaults = Namespace(
split = 4000,
)


#### NLTK Settings

NLTK = Namespace(
negative = "negative_tweets.json",
positive="positive_tweets.json",
)


#### Special Tokens

SpecialTokens = Namespace(padding="__PAD__",
ending="__</e>__",
unknown="__UNK__")

SpecialIDs = Namespace(
ending=1,
unknown=2,
)


#### The Builder

@attr.s(auto_attribs=True)
class TensorBuilder:
"""converts tweets to tensors

Args:
- split: where to split the training and validation data
"""
split = Defaults.split
_positive: list=None
_negative: list=None
_positive_training: list=None
_negative_training: list=None
_positive_validation: list=None
_negative_validation: list=None
_vocabulary: dict=None
_x_train: list=None

• Positive Tweets
@property
def positive(self) -> list:
"""The raw positive NLTK tweets"""
if self._positive is None:
return self._positive

• Negative Tweets
@property
def negative(self) -> list:
"""The raw negative NLTK tweets"""
if self._negative is None:
return self._negative

• Positive Training
@property
def positive_training(self) -> list:
"""The positive training data"""
if self._positive_training is None:
self._positive_training = self.positive[:self.split]
return self._positive_training

• Negative Training
@property
def negative_training(self) -> list:
"""The negative training data"""
if self._negative_training is None:
self._negative_training = self.negative[:self.split]
return self._negative_training

• Positive Validation
@property
def positive_validation(self) -> list:
"""The positive validation data"""
if self._positive_validation is None:
self._positive_validation = self.positive[self.split:]
return self._positive_validation

• Negative Validation
@property
def negative_validation(self) -> list:
"""The negative validation data"""
if self._negative_validation is None:
self._negative_validation = self.negative[self.split:]
return self._negative_validation

@property
"""processor for tweets"""
if self._process is None:
return self._process

• X Train
@property
def x_train(self) -> list:
"""The unprocessed training data"""
if self._x_train is None:
self._x_train = self.positive_training + self.negative_training
return self._x_train

• The Vocabulary
@property
def vocabulary(self) -> dict:
"""A map of token to numeric id"""
if self._vocabulary is None:
SpecialTokens.ending: SpecialIDs.ending,
SpecialTokens.unknown: SpecialIDs.unknown}
for tweet in self.x_train:
for token in self.process(tweet):
if token not in self._vocabulary:
self._vocabulary[token] = len(self._vocabulary)
return self._vocabulary

• To Tensor
def to_tensor(self, tweet: str) -> list:
"""Converts tweet to list of numeric identifiers

Args:
tweet: the string to convert

Returns:
list of IDs for the tweet
"""
tensor = [self.vocabulary.get(token, SpecialIDs.unknown)
for token in self.process(tweet)]
return tensor


#### The Generator

@attr.s(auto_attribs=True)
class TensorGenerator:
"""Generates batches of vectorized-tweets

Args:
converter: TensorBuilder object
positive_data: list of positive data
negative_data: list of negative data
batch_size: the size for each generated batch
shuffle: whether to shuffle the generated data
infinite: whether to generate data forever
"""
converter: TensorBuilder
positive_data: list
negative_data: list
batch_size: int
shuffle: bool=True
infinite: bool = True
_positive_indices: list=None
_negative_indices: list=None
_positives: iter=None
_negatives: iter=None

• Positive Indices
@property
def positive_indices(self) -> list:
"""The indices to use to grab the positive tweets"""
if self._positive_indices is None:
k = len(self.positive_data)
if self.shuffle:
self._positive_indices = random.sample(range(k), k=k)
else:
self._positive_indices = list(range(k))
return self._positive_indices

• Negative Indices
@property
def negative_indices(self) -> list:
"""Indices for the negative tweets"""
if self._negative_indices is None:
k = len(self.negative_data)
if self.shuffle:
self._negative_indices = random.sample(range(k), k=k)
else:
self._negative_indices = list(range(k))
return self._negative_indices

• Positives
@property
def positives(self):
"""The positive index generator"""
if self._positives is None:
self._positives = self.positive_generator()
return self._positives

• Negatives
@property
def negatives(self):
"""The negative index generator"""
if self._negatives is None:
self._negatives = self.negative_generator()
return self._negatives

• Positive Generator
def positive_generator(self):
"""Generator of indices for positive tweets"""
stop = len(self.positive_indices)
index = 0
while True:
yield self.positive_indices[index]
index += 1
if index == stop:
if not self.infinite:
break
if self.shuffle:
self._positive_indices = None
index = 0
return

• Negative Generator
def negative_generator(self):
"""generator of indices for negative tweets"""
stop = len(self.negative_indices)
index = 0
while True:
yield self.negative_indices[index]
index += 1
if index == stop:
if not self.infinite:
break
if self.shuffle:
self._negative_indices = None
index = 0
return

• The Iterator
def __iter__(self):
return self

• The Next Method
def __next__(self):
assert self.batch_size % 2 == 0
half_batch = self.batch_size // 2

# get the indices
positives = (next(self.positives) for index in range(half_batch))
negatives = (next(self.negatives) for index in range(half_batch))

# get the tweets
positives = (self.positive_data[index] for index in positives)
negatives = (self.negative_data[index] for index in negatives)

# get the token ids
try:
positives = [self.converter.to_tensor(tweet) for tweet in positives]
negatives = [self.converter.to_tensor(tweet) for tweet in negatives]
except RuntimeError:
# the next(self.positives) in the first generator will raise a
# RuntimeError if
# we're not running this infinitely
raise StopIteration

batch = positives + negatives

longest = max((len(tweet) for tweet in batch))

paddings = (longest - len(tensor) for tensor in batch)

# the labels for the inputs
targets = numpy.array([1] * half_batch + [0] * half_batch)

assert len(targets) == len(batch)

# default the weights to ones
weights = numpy.ones_like(targets)
return inputs, targets, weights


### Test It Out

from neurotic.nlp.twitter.tensor_generator import TensorBuilder, TensorGenerator

converter = TensorBuilder()
expect(len(converter.vocabulary)).to(equal(len(vocabulary)))

tweet = positive_validation[0]
expected = [1072, 96, 484, 2376, 750, 8220, 1132, 750, 53, 2, 2701, 796, 2, 2,
354, 606, 2, 3523, 1025, 602, 4599, 9, 1072, 158, 2, 2]

actual = converter.to_tensor(tweet)
expect(actual).to(contain_exactly(*expected))

generator = TensorGenerator(converter, batch_size=4)

print(next(generator))

(array([[ 749, 1019,  313, 1020,   75],
[1009,    9,    0,    0,    0],
[3540, 6030, 6031, 3798,    0],
[  50,   96, 3798,    0,    0]]), array([1, 1, 0, 0]), array([1, 1, 1, 1]))

for count, batch in enumerate(generator):
print(batch[0])
print()
if count == 5:
break
print(next(generator))

[[  22 1228  434  354  227 2371    9]
[ 267  160   89    0    0    0    0]
[ 315 1008 8480 3798 2108  371 3233]
[8232 8233  791 3798    0    0    0]]

[[1173 1061  586    9  896  729 1264  345 1062 1063]
[3387  558  991 2166 3388 3231  558  238  120    0]
[ 198 5997 3798    0    0    0    0    0    0    0]
[ 223  310 3798    0    0    0    0    0    0    0]]

[[4015 4015 4015 4016  231 2117   57  422    9 4017 4018 4019   86   86]
[2554   57  102  358   75    0    0    0    0    0    0    0    0    0]
[  50   38  881 3798    0    0    0    0    0    0    0    0    0    0]
[6729 6730 6731  382 3798    0    0    0    0    0    0    0    0    0]]

[[3479   75    0    0    0    0    0    0    0    0    0    0    0    0
0    0    0]
[4636 4637  233 4299  111  237 2626    9    0    0    0    0    0    0
0    0    0]
[  73  381  463 4321  142   96 7390 7391   92   85 1394 7392 5895 7393
45 3798 7394]
[8863 2844  991  127 5818    0    0    0    0    0    0    0    0    0
0    0    0]]

[[ 226  615   22   75    0    0]
[2135  703  237  435 3124    9]
[2379 6264 3798    0    0    0]
[6504 1912 2380 3798    0    0]]

[[5623  120    0    0    0    0    0    0    0    0]
[ 133   54  102   63 1300   56    9   50   92 3181]
[2094  383   73  464 3798    0    0    0    0    0]
[ 223  101 8754  383 2085 5818 8755    0    0    0]]

(array([[ 374,   44, 2981,  435,  132,  111, 1040, 1382,    9,    0,    0,
0],
[ 369,  398,  283,    9, 2671, 1411,  136,  184,  769, 1262, 2061,
3460],
[1094, 9024,  315,  381, 3798,    0,    0,    0,    0,    0,    0,
0],
[9036, 3798,    0,    0,    0,    0,    0,    0,    0,    0,    0,
0]]), array([1, 1, 0, 0]), array([1, 1, 1, 1]))


Ladies and gentlemen, we have ourselves a generator.

## End

Now that we have our data, the next step will be to define the model.

# Sentiment Analysis: Deep Learning Model

## Beginning

Previously we created sentiment analysis models using the Logistic Regression and Naive Bayes algorithms. However if we were to give those models an example like:

This movie was almost good.

The model would have predicted a positive sentiment for that review. That sentence, however, is expressing the negative sentiment that the movie was not good. To solve those kinds of misclassifications we will write a program that uses deep neural networks to identify sentiment in text.

This model will follow a similar structure to the Continuous Bag of Words Model (Introducing the CBOW Model) that we looked at previously - indeed most of the deep nets have a similar structure. The only thing that changes is the model architecture, the inputs, and the outputs. Although we looked at Trax and JAX in a previous post (Introducing Trax) we'll start off with a review of some of their features and then in future posts we'll implement the actual model. These are the other posts.

### Imports

# from python
import os
import random

# from pypi
from trax import layers
import trax
import trax.fastmath.numpy as numpy


### Set Up

#### The Random Seed

trax.supervised.trainer_lib.init_random_number_generators(31)


## Middle

### Trax Review

#### JAX Arrays

First, the JAX reimplementation of numpy (from Trax.fastmath).

an_array = numpy.array(5.0)
display(an_array)
print(type(an_array))

DeviceArray(5., dtype=float32)
<class 'jax.interpreters.xla._DeviceArray'>


Note: the trax library is strict about the typing so 5 won't work, it has to be a float.

#### Squaring

Now we'll create a function to square the array.

def square(x) :
return x**2

print(f"f({an_array}) -> {square(an_array)}")

f(5.0) -> 25.0


The gradient (derivative) of function f with respect to its input x is the derivative of $x^2$.

• The derivative of $x^2$ is $2x$.
• When x is 5, then 2x=10.

You can calculate the gradient of a function by using trax.fastmath.grad(fun=) and passing in the name of the function.

• In this case the function you want to take the gradient of is square.
• The object returned (saved in square_gradient in this example) is a function that can calculate the gradient of square for a given trax.fastmath.numpy array.

Use trax.fastmath.grad to calculate the gradient (derivative) of the function.

square_gradient = trax.fastmath.grad(fun=square)


<class 'function'>

gradient_calculation = square_gradient(an_array)

DeviceArray(10., dtype=float32)


The function returned by trax.fastmath.grad takes in x=5 and calculates the gradient of square, which is 2x, which equals 10. The value is also stored as a DeviceArray from the jax library.

## Raw

# import Layer from the utils.py file
from utils import Layer, load_tweets, process_tweet
#from utils import



# Data Generators

## Data generators

In Python, a generator is a function that behaves like an iterator. It will return the next item. In many AI applications, it is advantageous to have a data generator to handle loading and transforming data for different applications.

In the following example, we use a set of samples a, to derive a new set of samples, with more elements than the original set.

Note: Pay attention to the use of list lines_index and variable index to traverse the original list.

### Imports

# python
from itertools import cycle

import random

# pypi
from expects import be_true, expect
import numpy


## Examples

### An Example of a Circular List

This is sort of a fake generator that uses indices to make it look like it's infinite.

a = [1, 2, 3, 4]
a_size = len(a)
end = 10
index = 0                      # similar to index in data_generator below
for i in range(10):        # b is longer than a forcing a wrap
print(a[index], end=",")
index = (index + 1) % a_size

1,2,3,4,1,2,3,4,1,2,


There's a python built-in that's equivalent to this called cycle.

index = 1
for item in cycle(a):
print(item, end=",")
if index == end:
break
index += 1

1,2,3,4,1,2,3,4,1,2,


And if you wanted to make your own generator version you could use the yield keyword.

def infinite(a: list):
"""Generates elements infinitely

Args:
a: list

Yields:
elements of a
"""
index = 0
end = len(a)
while True:
yield a[index]
index = (index + 1) % end
return

a_infinite = infinite(a)
for index, item in enumerate(a_infinite):
if index == end:
break
print(item, end=",")

1,2,3,4,1,2,3,4,1,2,


### Shuffling the data order

In the next example, we will do the same as before, but shuffling the order of the elements in the output list. Note that here, our strategy of traversing using lines_index and index becomes very important, because we can simulate a shuffle in the input data, without doing that in reality.

a = tuple((1, 2, 3, 4))
a_size = len(a)
data_indices = list(range(a_size))
print(f"Original order of indices: {data_indices}")

Original order of indices: [0, 1, 2, 3]


If we shuffle the index_list we can change the order of our circular list without modifying the order or our original data.

random.shuffle(data_indices) # Shuffle the order
print(f"Shuffled order of indices: {data_indices}")

Shuffled order of indices: [3, 0, 1, 2]


Now we create a list of random values from a that is larger than a.

b = [a[index] for index in data_indices]
b_size = 10

print(f"New value order for first batch: {b}")
batch_counter = 1
data_index = 0
for b_index in range(len(b), b_size):
if data_index == 0:
batch_counter += 1
random.shuffle(data_indices)
print(f"\nShuffled Indexes for Batch No. {batch_counter} :{data_indices}")
print(f"Values for Batch No.{batch_counter} :{[a[index] for index in data_indices]}")

b.append(a[data_indices[data_index]])
data_index = (data_index + 1) % a_size

print(f"\nFinal value of b: {b} with {len(b)} items")

New value order for first batch: [1, 3, 4, 2]

Shuffled Indexes for Batch No. 2 :[1, 3, 2, 0]
Values for Batch No.2 :[2, 4, 3, 1]

Shuffled Indexes for Batch No. 3 :[0, 3, 2, 1]
Values for Batch No.3 :[1, 4, 3, 2]

Final value of b: [1, 3, 4, 2, 2, 4, 3, 1, 1, 4] with 10 items


Note: We call an epoch each time that an algorithm passes over all the training examples. Shuffling the examples for each epoch is known to reduce variance, making the models more general and overfit less.

data_indices = random.sample(range(a_size), k=a_size)
b = [a[index] for index in data_indices]
b_size = 10

print(f"New value order for first batch: {b}")
batch_counter = 1
data_index = 0
for b_index in range(len(b), b_size):
if data_index == 0:
batch_counter += 1
data_indices = random.sample(data_indices, k=a_size)
print(f"\nShuffled Indexes for Batch No. {batch_counter} :{data_indices}")
print(f"Values for Batch No.{batch_counter} :{[a[index] for index in data_indices]}")

b.append(a[data_indices[data_index]])
data_index = (data_index + 1) % a_size

print(f"\nFinal value of b: {b} with {len(b)} items")

New value order for first batch: [1, 4, 3, 2]

Shuffled Indexes for Batch No. 2 :[3, 0, 1, 2]
Values for Batch No.2 :[4, 1, 2, 3]

Shuffled Indexes for Batch No. 3 :[2, 0, 1, 3]
Values for Batch No.3 :[3, 1, 2, 4]

Final value of b: [1, 4, 3, 2, 4, 1, 2, 3, 3, 1] with 10 items


### Data Generator Function

This will be a data generator function that takes in batch_size, x, y shuffle where x could be a large list of samples, and y is a list of the tags associated with those samples. Return a subset of those inputs in a tuple of two arrays (X,Y). Each is an array of dimension (batch_size). If shuffle=True, the data will be traversed in a random form.

Which runs continuously in the fashion of generators, pausing when yielding the next values. We will generate a batch_size output on each pass of this loop.

It has an inner loop that stores the data samples in temporary lists (X, Y) which will be included in the next batch.

There are three slightly out-of-the-ordinary features to this function.

1. The first is the use of a list of a predefined size to store the data for each batch. Using a predefined size list reduces the computation time if the elements in the array are of a fixed size, like numbers. If the elements are of different sizes, it is better to use an empty array and append one element at a time during the loop.
2. The second is tracking the current location in the incoming lists of samples. Generators variables hold their values between invocations, so we create an index variable, initialize to zero, and increment by one for each sample included in a batch. However, we do not use the index to access the positions of the list of sentences directly. Instead, we use it to select one index from a list of indexes. In this way, we can change the order in which we traverse our original list, keeping untouched our original list.
3. The third also relates to wrapping. Because batch_size and the length of the input lists are not aligned, gathering a batch_size group of inputs may involve wrapping back to the beginning of the input loop. In our approach, it is just enough to reset the index to 0. We can re-shuffle the list of indexes to produce different batches each time.
def data_generator(batch_size: int, data_x: list, data_y: list, shuffle: bool=True):
"""Infinite batch generator

Args:
batch_size: the size to make batches
data_x: list containing samples
data_y: list containing labels
shuffle: Shuffle the data order

Yields:
a tuple containing 2 elements:
X - list of dim (batch_size) of samples
Y - list of dim (batch_size) of labels
"""
amount_of_data = len(data_x)
assert amount_of_data == len(data_y)

def re_shuffle(x):
k = len(x)
return random.sample(range(k), k=k)

shuffler = re_shuffle if shuffle else lambda x: list(range(len(x)))
source_indices = shuffler(data_x)

source_location = 0
while True:
X = list(range(batch_size))
Y = list(range(batch_size))

for batch_location in range(batch_size):
X[batch_location] = data_x[source_indices[source_location]]
Y[batch_location] = data_y[source_indices[source_location]]
source_location = (source_location + 1) % amount_of_data
source_indices = (shuffler(data_x) if source_location == 0
else source_indices)
yield((X, Y))
return

def test_data_generator() -> None:
"""Tests the un-shuffled version of the generator

Raises:
AssertionError: some value didn't match.
"""
x = [1, 2, 3, 4]
y = [xi ** 2 for xi in x]

generator = data_generator(3, x, y, shuffle=False)
for expected in (([1, 2, 3], [1, 4, 9]),
([4, 1, 2], [16, 1, 4]),
([3, 4, 1], [9, 16, 1]),
([2, 3, 4], [4, 9, 16])):
expect(numpy.allclose(next(generator), expected)).to(be_true)
return
test_data_generator()


# Classes and Subclasses

## Classes and Subclasses

In this notebook, I will show you the basics of classes and subclasses in Python. As you've seen in the lectures from this week, Trax uses layer classes as building blocks for deep learning models, so it is important to understand how classes and subclasses behave in order to be able to build custom layers when needed.

By completing this notebook, you will:

• Be able to define classes and subclasses in Python
• Understand how inheritance works in subclasses
• Be able to work with instances

### Imports

# from pypi
from expects import (
equal,
expect,
raise_error
)
import attr


## Middle

### Part 1: Parameters, methods and instances

First, let's define a class SomeClass.

class SomeClass:
x = None


SomeClass has one parameter x without any value. You can think of parameters as the variables that every object assigned to a class will have. So, at this point, any object of class My_Class would have a variable x equal to None. To check this, I'll create two instances of that class and get the value of x for both of them.

instance_a= SomeClass()
instance_b= SomeClass()
print(f"Parameter x of instance_a: {instance_a.x}")
print(f"Parameter x of instance_b: {(instance_b.x)}")

Parameter x of instance_a: None
Parameter x of instance_b: None


For an existing instance you can assign new values for any of its parameters. In the next cell, assign a value of 5 to the parameter x of instance_a.

instance_a.x = 5
print(f"Parameter x of instance_a: {instance_a.x}")

Parameter x of instance_a: 5


#### The __init__ method

When you want to assign values to the parameters of your class when an instance is created, it is necessary to define a special method: __init__. The __init__ method is called when you create an instance of a class. It can have multiple arguments to initialize the paramenters of your instance. In the next cell I will define My_Class with an __init__ method that takes the instance (self) and an argument y as inputs.

@attr.s(auto_attribs=True)
class SomeClass:
x: int=None

instance_c = SomeClass(10)
print(f"{instance_c}")

SomeClass(x=10)


#### The __call__ method

Another important method is the __call__ method. It is performed whenever you call an initialized instance of a class. It can have multiple arguments and you can define it to do whatever you want like

• Change a parameter,
• Print a message,
• Create new variables, etc.
@attr.s(auto_attribs=True)
class SomeClass:
x: int

def __call__(self, z: int):
self.x += z
print(self.x)

instance_d = SomeClass(5)


And now, see what happens when instance_d is called with argument 10.

instance_d(10)

15


Now, you are ready to complete the following cell so any instance from SomeClass:

• Is initialized taking two arguments y and z and assigns them to x_1 and x_2, respectively. And,
• When called, takes the values of the parameters x_1 and x_2, sums them, prints and returns the result.
@attr.s(auto_attribs=True)
class SomeClass:
x_1: int
x_2: int

def __call__(self) -> int:
result = self.x_1 + self.x_2
print(f"Addition of {self.x_1} and {self.x_2} is {result}")
return result


Run the next cell to check your implementation. If everything is correct, you shouldn't get any errors.

instance_e = SomeClass(x_1=10, x_2=15)

def test_class_definition():
expect(instance_e.x_1).to(equal(10))
expect(instance_e.x_2).to(equal(15))
expect(instance_e()).to(equal(25))
return

test_class_definition()

Addition of 10 and 15 is 25


#### Custom methods

In addition to the __init__ and __call__ methods, your classes can have custom-built methods to do whatever you want when called. To define a custom method, you have to indicate its input arguments, the instructions that you want it to perform and the values to return (if any). In the next cell, My_Class is defined with my_method that multiplies the values of x_1 and x_2, sums that product with an input w, and returns the result.

@attr.s(auto_attribs=True)
class SomeClass:
x_1: int
x_2: int

def __call__(self) -> int:
return self.x_1 - 2 * self.x_2

def some_method(self, w: int) -> int:
return self.x_1 * self.x_2 + w


Create an instance instance_f of My_Class with any integer values that you want for x_1 and x_2. For that instance, see the result of calling My_method, with an argument w equal to 16.

instance_f = SomeClass(1, 10)
print(f"Output of some_method: {instance_f.some_method(16)}")

Output of some_method: 26


As you can corroborate in the previous cell, to call a custom method m, with arguments args, for an instance i you must write i.m(args). With that in mind, methods can call others within a class. In the following cell, try to define new_method which calls my_method with v as input argument. Try to do this on your own in the cell given below.

@attr.s(auto_attribs=True)
class SomeClass:
x_1: int = None
x_2: int = None

def __call__(self) -> int:
return self.x_1 - 2 * self.x_2

def some_method(self, w: int) -> int:
return self.x_1 * self.x_2 + w

def some_new_method(self, v: int) -> int:
return self.some_method(v)

instance_g = SomeClass(1, 10)
print(f"Output of some_method: {instance_g.some_method(16)}")
print(f"Output of some_new_method: {instance_g.some_new_method(16)}")

Output of some_method: 26
Output of some_new_method: 26


### Part 2: Subclasses and Inheritance

Trax uses classes and subclasses to define layers. The base class in Trax is layer, which means that every layer from a deep learning model is defined as a subclass of the layer class. In this part of the notebook, you are going to see how subclasses work. To define a subclass sub from class super, you have to write class sub(super): and define any method and parameter that you want for your subclass. In the next cell, I define sub_c as a subclass of My_Class with only one method (additional_method).

class SomeSub(SomeClass):
print(self.x_1)
return


#### Inheritance

When you define a subclass sub, every method and parameter is inherited from super class, including the __init__ and __call__ methods. This means that any instance from sub can use the methods defined in super. Run the following cell and see for yourself.

instance_sub_a = SomeSub(1, 10)
print(f"Parameter x_1 of instance_sub_a: {instance_sub_a.x_1}")
print(f"Parameter x_2 of instance_sub_a: {instance_sub_a.x_2}")
print(f"Output of some_method of instance_sub_a: {instance_sub_a.some_method(16)}")

Parameter x_1 of instance_sub_a: 1
Parameter x_2 of instance_sub_a: 10
Output of my_method of instance_sub_a: 26


As you can see, sub_c does not have an initialization method __init__, it is inherited from My_class. However, you can overwrite any method you want by defining it again in the subclass. For instance, in the next cell define a class sub_c with a redefined my_Method that multiplies x_1 and x_2 but does not add any additional argument.

@attr.s(auto_attribs=True)
class SomeSub(SomeClass):
def some_method(self):
return self.x_1 * self.x_2


To check your implementation run the following cell.

test = SomeSub(3, 10)
actual = test.some_method()
expect(actual).to(equal(30))

print(f"Output of overridden my_method of test: {actual}")

test.some_method(16)


Output of overridden my_method of test: 30


In the next cell, two instances are created, one of My_Class and another one of sub_c. The instances are initialized with equal x_1 and x_2 parameters.

y, z= 1, 10
instance_sub_a = SomeSub(y,z)
instance_a = SomeClass(y,z)
print(f"My_method for an instance of sub_c returns: {instance_sub_a.some_method()}")
print(f"My_method for an instance of My_Class returns: {instance_a.some_method(10)}")

My_method for an instance of sub_c returns: 10
My_method for an instance of My_Class returns: 20


As you can see, even though sub_c is a subclass from My_Class and both instances are initialized with the same values, My_method returns different results for each instance because you overwrote My_method for sub_c.

# Introducing Trax

## Background

This is going to be a first look at Trax a Deep Learning framework built by the Google Brain team.

### Why Trax and not TensorFlow or PyTorch?

TensorFlow and PyTorch are both extensive frameworks that can do almost anything in deep learning. They offer a lot of flexibility, but that often means verbosity of syntax and extra time to code.

Trax is much more concise. It runs on a TensorFlow backend but allows you to train models with 1 line commands. Trax also runs end to end, allowing you to get data, model and train all with a single terse statement. This means you can focus on learning, instead of spending hours on the idiosyncrasies of a big framework's implementation.

### Why not Keras then?

Keras is now part of Tensorflow itself from 2.0 onwards. Also, trax is good for implementing new state of the art algorithms like Transformers, Reformers, BERT because it is actively maintained by Google Brain Team for advanced deep learning tasks. It runs smoothly on CPUs,GPUs and TPUs as well with comparatively lesser modifications in code.

### How to Code in Trax

Building models in Trax relies on 2 key concepts:- layers and combinators. Trax layers are simple objects that process data and perform computations. They can be chained together into composite layers using Trax combinators, allowing you to build layers and models of any complexity.

### Trax, JAX, TensorFlow and Tensor2Tensor

You already know that Trax uses Tensorflow as a backend, but it also uses the JAX library to speed up computation too. You can view JAX as an enhanced and optimized version of numpy.

You import their version of numpy using import trax.fastmath.numpy. If you see this line, remember that when calling numpy you are really calling Trax’s version of numpy that is compatible with JAX.**

As a result of this, where you used to encounter the type numpy.ndarray now you will find the type jax.interpreters.xla.DeviceArray. The documentation for JAX is here and specifically they have a page with the numpy functions implemented so far.

Tensor2Tensor is another name you might have heard. It started as an end to end solution much like how Trax is designed, but it grew unwieldy and complicated. So you can view Trax as the new improved version that operates much faster and simpler.

### Installing Trax

Note that there is another library called TraX which is something different.

We're going to use Trax version 1.3.1 here, so to install it with pip:

pip install trax==1.3.1


Note the == for the version, not =. This is a very big install so maybe take a break after you run it. You aren't going to get the full benefit of JAX if you don't have CUDA set up can use TPUs so make sure to set up CUDA if you're not using google colab. I also had to install cmake to get trax to install.

### Imports

# pypi
import numpy

from trax import layers
from trax import shapes
from trax import fastmath

• Layers are the basic building blocks for Trax
• shapes are used for data handling
• fastmath is the JAX version of numpy that can run on GPUs and TPUs

## Middle

### Layers

Layers are the core building blocks in Trax - they are the base classes. They take inputs, compute functions/custom calculations and return outputs.

#### Relu Layer

First we'll build a ReLU activation function as a layer. A layer like this is one of the simplest types. Notice there is no object initialization so it works just like a math function.

Note: Activation functions are also layers in Trax, which might look odd if you have been using other frameworks for a longer time.

relu = layers.Relu()


You can inspect the properties of a layer:

print("-- Properties --")
print("name :", relu.name)
print("expected inputs :", relu.n_in)
print("promised outputs :", relu.n_out, "\n")

-- Properties --
name : Relu
expected inputs : 1
promised outputs : 1



We'll make an input the layer using numpy.

x = numpy.array([-2, -1, 0, 1, 2])
print("-- Inputs --")
print("x :", x, "\n")

-- Inputs --
x : [-2 -1  0  1  2]



And see what it puts out.

y = relu(x)
print("-- Outputs --")
print("y :", y)

WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
-- Outputs --
y : [0 0 0 1 2]


I don't know why but JAX doesn't thing I have a GPU, even though tensorflow does. This whole thing is a little messed up right now because the current release of tensorflow doesn't work on Ubuntu 20.10. I'm running it with the nightly build (2.5) but I have to install all the Trax dependencies one at a time or it will clobber the tensorflow installation with the older version (the one that doesn't work) so there's a lot of places for error.

#### Concatenate Layer

Now a layer that takes 2 inputs. Notice the change in the expected inputs property from 1 to 2.

First create a concatenate trax layer and check out its properties.

concatenate = layers.Concatenate()
print("-- Properties --")
print("name :", concatenate.name)
print("expected inputs :", concatenate.n_in)
print("promised outputs :", concatenate.n_out, "\n")

-- Properties --
name : Concatenate
expected inputs : 2
promised outputs : 1



Now create the two inputs.

x1 = numpy.array([-10, -20, -30])
x2 = x1 / -10
print("-- Inputs --")
print("x1 :", x1)
print("x2 :", x2, "\n")

-- Inputs --
x1 : [-10 -20 -30]
x2 : [1. 2. 3.]



And now feed the inputs through the concatenate layer.

y = concatenate([x1, x2])
print("-- Outputs --")
print("y :", y)

-- Outputs --
y : [-10. -20. -30.   1.   2.   3.]


#### Configuring Layers

You can change the default settings of layers. For example, you can change the expected inputs for a concatenate layer from 2 to 3 using the optional parameter n_items.

concatenate_three = layers.Concatenate(n_items=3)
print("-- Properties --")
print("name :", concatenate_three.name)
print("expected inputs :", concatenate_three.n_in)
print("promised outputs :", concatenate_three.n_out, "\n")

-- Properties --
name : Concatenate
expected inputs : 3
promised outputs : 1



Create some inputs.

x1 = numpy.array([-10, -20, -30])
x2 = x1 / -10
x3 = x2 * 0.99
print("-- Inputs --")
print("x1 :", x1)
print("x2 :", x2)
print("x3 :", x3, "\n")

-- Inputs --
x1 : [-10 -20 -30]
x2 : [1. 2. 3.]
x3 : [0.99 1.98 2.97]



And now do the concatenation.

y = concatenate_three([x1, x2, x3])
print("-- Outputs --")
print("y :", y)

-- Outputs --
y : [-10.   -20.   -30.     1.     2.     3.     0.99   1.98   2.97]


#### Layer Weights

Some layer types include mutable weights and biases that are used in computation and training. Layers of this type require initialization before use.

For example the LayerNorm layer calculates normalized data, that is also scaled by weights and biases. During initialization you pass the data shape and data type of the inputs, so the layer can initialize compatible arrays of weights and biases.

Initialize it.

norm = layers.LayerNorm()


Now some input data.

x = numpy.array([0, 1, 2, 3], dtype="float")


Use the input data signature to get the shape and type for the initializing weights and biases. We need to convert the input datatype from the usual ndarray to a trax ShapeDtype

norm.init(shapes.signature(x))

print("Normal shape:",x.shape, "Data Type:",type(x.shape))
print("Shapes Trax:",shapes.signature(x),"Data Type:",type(shapes.signature(x)))

Normal shape: (4,) Data Type: <class 'tuple'>
Shapes Trax: ShapeDtype{shape:(4,), dtype:float64} Data Type: <class 'trax.shapes.ShapeDtype'>


Here are its properties.

print("-- Properties --")
print("name :", norm.name)
print("expected inputs :", norm.n_in)
print("promised outputs :", norm.n_out)

-- Properties --
name : LayerNorm
expected inputs : 1
promised outputs : 1


And the weights and biases.

print("weights :", norm.weights[0])
print("biases :", norm.weights[1],)

weights : [1. 1. 1. 1.]
biases : [0. 0. 0. 0.]


We have our input array.

print("-- Inputs --")
print("x :", x)

-- Inputs --
x : [0. 1. 2. 3.]


So we can inspect what the layer did to it.

y = norm(x)
print("-- Outputs --")
print("y :", y)

-- Outputs --
y : [-1.3416404  -0.44721344  0.44721344  1.3416404 ]


If you look at it you can see that the positives cancel out the negatives, giving us a sum of 0. I don't know why that's the norm, but maybe it'll become obvious later.

#### Custom Layers

You can create your own custom layers too and define custom functions for computations by using layers.Fn. Let me show you how.

help(layers.Fn)

Help on function Fn in module trax.layers.base:

Fn(name, f, n_out=1)
Returns a layer with no weights that applies the function f.

f can take and return any number of arguments, and takes only positional
arguments -- no default or keyword arguments. It often uses JAX-numpy (jnp).
The following, for example, would create a layer that takes two inputs and
returns two outputs -- element-wise sums and maxima:

Fn('SumAndMax', lambda x0, x1: (x0 + x1, jnp.maximum(x0, x1)), n_out=2)

The layer's number of inputs (n_in) is automatically set to number of
positional arguments in f, but you must explicitly set the number of
outputs (n_out) whenever it's not the default value 1.

Args:
name: Class-like name for the resulting layer; for use in debugging.
f: Pure function from input tensors to output tensors, where each input
tensor is a separate positional arg, e.g., f(x0, x1) --> x0 + x1.
Output tensors must be packaged as specified in the Layer class
docstring.
n_out: Number of outputs promised by the layer; default value 1.

Returns:
Layer executing the function f.

• Define a custom layer

In this example we'll create a layer to calculate the input times 2.

def double_it() -> layers.Fn:
"""A custom layer function that doubles any inputs

Returns:
a custom function that takes one numeric argument and doubles it
"""
layer_name = "TimesTwo"

# Custom function for the custom layer
def func(x):
return x * 2

return layers.Fn(layer_name, func)

• Test it
double = double_it()

print("-- Properties --")
print("name :", double.name)
print("expected inputs :", double.n_in)
print("promised outputs :", double.n_out)

-- Properties --
name : TimesTwo
expected inputs : 1
promised outputs : 1

x = numpy.array([1, 2, 3])
print("-- Inputs --")
print("x :", x, "\n")
y = double(x)
print("-- Outputs --")
print("y :", y)

-- Inputs --
x : [1 2 3]

-- Outputs --
y : [2 4 6]


### Combinators

You can combine layers to build more complex layers. Trax provides a set of objects named combinator layers to make this happen. Combinators are themselves layers, so behavior commutes.

#### Serial Combinator

This is the most common and easiest to use. You could, for example, build a simple neural network by combining layers into a single layer using the Serial combinator. This new layer then acts just like a single layer, so you can inspect intputs, outputs and weights. Or even combine it into another layer! Combinators can then be used as trainable models. Try adding more layers.

Note:As you must have guessed, if there is serial combinator, there must be a parallel combinator as well. Do try to explore about combinators and other layers from the trax documentation and look at the repo to understand how these layers are written.

serial = layers.Serial(
layers.LayerNorm(),
layers.Relu(),
double,
layers.Dense(n_units=2),
layers.Dense(n_units=1),
layers.LogSoftmax()
)

• Initialization
x = numpy.array([-2, -1, 0, 1, 2]) #input
serial.init(shapes.signature(x))

print("-- Serial Model --")
print(serial,"\n")
print("-- Properties --")
print("name :", serial.name)
print("sublayers :", serial.sublayers)
print("expected inputs :", serial.n_in)
print("promised outputs :", serial.n_out)
print("weights & biases:", serial.weights, "\n")

-- Serial Model --
Serial[
LayerNorm
Relu
TimesTwo
Dense_2
Dense_1
LogSoftmax
]

-- Properties --
name : Serial
sublayers : [LayerNorm, Relu, TimesTwo, Dense_2, Dense_1, LogSoftmax]
expected inputs : 1
promised outputs : 1
weights & biases: [(DeviceArray([1, 1, 1, 1, 1], dtype=int32), DeviceArray([0, 0, 0, 0, 0], dtype=int32)), (), (), (DeviceArray([[ 0.19178385,  0.1832077 ],
[-0.36949775, -0.03924937],
[ 0.43800744,  0.788491  ],
[ 0.43107533, -0.3623491 ],
[ 0.6186575 ,  0.04764405]], dtype=float32), DeviceArray([-3.0051979e-06,  1.4359505e-06], dtype=float32)), (DeviceArray([[-0.6747592],
[-0.8550365]], dtype=float32), DeviceArray([-8.9325863e-07], dtype=float32)), ()]

print("-- Inputs --")
print("x :", x, "\n")

y = serial(x)
print("-- Outputs --")
print("y :", y)

-- Inputs --
x : [-2 -1  0  1  2]

-- Outputs --
y : [0.]


### JAX

Just remember to lookout for which numpy you are using, the regular numpy or Trax's JAX compatible numpy. Watch those import blocks. Numpy and fastmath.numpy have different data types.

Regular numpy.

x_numpy = numpy.array([1, 2, 3])
print("good old numpy : ", type(x_numpy), "\n")

good old numpy :  <class 'numpy.ndarray'>



Fastmath and jax numpy.

x_jax = fastmath.numpy.array([1, 2, 3])
print("jax trax numpy : ", type(x_jax))

jax trax numpy :  <class 'jax.interpreters.xla._DeviceArray'>


## End

• Trax is a concise framework, built on TensorFlow, for end to end machine learning. The key building blocks are layers and combinators.
• This was a lab that was part of coursera's Natural Language Processing with Sequence Models course put up by DeepLearning.AI.

# Word Embeddings: Visualizing the Embeddings

## Extracting and Visualizing the Embeddings

In the previous post we built a Continuous Bag of Words model to predict a word based on the fraction of words each word surrounding it made up within a window (e.g. the fraction of the four words surrounding the word that each word made up). Now we're going to use the weights of the model as word embeddings and see if we can visualize them.

### Imports

# python
from argparse import Namespace
from functools import partial

# pypi
from sklearn.decomposition import PCA

import holoviews
import hvplot.pandas
import pandas

# this project
from neurotic.nlp.word_embeddings import (
Batches,
CBOW,
DataCleaner,
TheTrainer,
)
# my other stuff
from graeae import EmbedHoloviews, Timer


### Set Up

cleaner = DataCleaner()
TIMER = Timer(speak=False)
SLUG = "word-embeddings-visualizing-the-embeddings"
Embed = partial(EmbedHoloviews, folder_path=f"files/posts/nlp/{SLUG}")
Plot = Namespace(
width=990,
height=780,
fontscale=2,
tan="#ddb377",
blue="#4687b7",
red="#ce7b6d",
)

hidden_layer = 50
half_window = 2
batch_size = 128
repetitions = 250
vocabulary_size = len(meta.vocabulary)

model = CBOW(hidden=hidden_layer, vocabulary_size=vocabulary_size)
batches = Batches(data=cleaner.processed, word_to_index=meta.word_to_index,
half_window=half_window, batch_size=batch_size, batches=repetitions)

trainer = TheTrainer(model, batches, emit_point=50, verbose=True)
with TIMER:
trainer()

2020-12-16 16:32:17,189 graeae.timers.timer start: Started: 2020-12-16 16:32:17.189213
50: loss=9.88889093658385
new learning rate: 0.0198
100: loss=9.138356897918037
150: loss=9.149555378031549
new learning rate: 0.013068000000000001
200: loss=9.077599951734605
2020-12-16 16:32:37,403 graeae.timers.timer end: Ended: 2020-12-16 16:32:37.403860
2020-12-16 16:32:37,405 graeae.timers.timer end: Elapsed: 0:00:20.214647
250: loss=8.607763835003631

print(trainer.best_loss)

8.186490214727549


## Middle

### Set It Up

We're going to use the method of averaging the weights of the two layers to form the embeddings.

embeddings = (trainer.best_weights.input_weights.T
+ trainer.best_weights.hidden_weights)/2


And now our words.

words = ["king", "queen","lord","man", "woman","dog","wolf",


Now we need to translate the words into their indices so we can grab the rows in the mebedding that match.

indices = [meta.word_to_index[word] for word in words]
X = embeddings[indices, :]
print(X.shape, indices)

(10, 50) [2745, 3951, 2961, 3023, 5675, 1452, 5674, 4191, 2316, 4278]


There are 10 rows to match our ten words and 50 columns to match the number chosen for the hidden layer.

### Visualizing

We're going to use sklearn's PCA for Principal Component Analysis. The n_components argument is the number of components it will keep - we'll keep 2.

pca = PCA(n_components=2)
reduced = pca.fit(X).transform(X)
pca_data = pandas.DataFrame(
reduced,
columns=["X", "Y"])

pca_data["Word"] = words

points = pca_data.hvplot.scatter(x="X",
y="Y", color=Plot.red)
labels = pca_data.hvplot.labels(x="X", y="Y", text="Word", text_baseline="top")
plot = (points * labels).opts(
title="PCA Embeddings",
height=Plot.height,
width=Plot.width,
fontscale=Plot.fontscale,
)
outcome = Embed(plot=plot, file_name="embeddings_pca")()

print(outcome)


Well, that's pretty horrible. Might need work.

## End

This is the final post in the series looking at using a Continuous Bag of Words model to create word embeddings. Here are the other posts.