Handwriting Recognition Exercise

Beginning

The goal of this exercise is to train a classifier on the MNIST dataset that reaches 99% during training without using a fixed number of training epochs.

Imports

Python

from argparse import Namespace
from functools import partial
import random

From PyPi

import holoviews
import tensorflow

My Stuff

from graeae.visualization.embed import EmbedHoloview

The Plotting

embed = partial(
    EmbedHoloview, 
    folder_path="../../files/posts/keras/handwriting-recognition-exercise/")
Plot = Namespace(
    size=600,
)
holoviews.extension("bokeh")

The Dataset

(training_images, training_labels), (testing_images, testing_labels) = (
    tensorflow.keras.datasets.mnist.load_data())

Middle

The Dataset

What do we have here?

rows, x, y = training_images.shape
print(f"Training Images: {rows:,} ({x} x {y})")
Training Images: 60,000 (28 x 28)

The Fashion MNIST dataset that I looked at previously was meant to be a drop-in replacement for this data set so it has the same number of images and the images are the same size.

index = random.randrange(len(training_images))
image = training_images[index]
plot = holoviews.Image(
    image,
).opts(
    tools=["hover"],
    title=f"MNIST Handwritten {training_labels[index]}",
    width=Plot.size,
    height=Plot.size,
    )
embed(plot=plot, file_name="sample_image")()

Figure Missing

The dataset is a set of hand-written digits (one each image) that we want to be able to classify.

print(training_images.min())
print(training_images.max())
0
255

The images are 28 x 28 matrices of values from 0 (representing black) to 255 (representing white).

Normalizing the Data

We want the values to be from 0 to 1 so I'm going to normalize them.

training_images_normalized = training_images/255
testing_images_normalized = testing_images/255
print(training_images_normalized.max())
print(training_images_normalized.min())
1.0
0.0

The Model

This is going to be a model with one hidden layer.

def build_model(units: int=128):
    """Build a sequential model with one hidden layer

    Args:
     units: number of units in the hidden layer
    """
    model = tensorflow.keras.models.Sequential()

    # flatten the image
    model.add(tensorflow.keras.layers.Flatten())

    # the hidden layer
    model.add(tensorflow.keras.layers.Dense(units=units, 
                                            activation=tensorflow.nn.relu))
    model.add(tensorflow.keras.layers.Dense(units=10,
                                            activation=tensorflow.nn.softmax))
    return model

The Callback

To make the training end at 99% accuracy I'll add a callback.

class Stop(tensorflow.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
        print(logs)
        if ("acc" in logs) and  (logs.get("acc") >= 0.99):
            print(f"Stopping point reached at epoch {epoch}")
            print(f"Model Accuracy: {logs.get('accuracy')}")
            self.model.stop_training = True
def train(units=128):
    """Build and trains the model

    Args:
     units: number of neurons in the hidden layer
    """
    callbacks = Stop()
    model = build_model(units)
    model.compile(
        optimizer = "adam",
        loss = "sparse_categorical_crossentropy",
        metrics=["accuracy"]
    )
    model.fit(training_images_normalized, training_labels,
              epochs=100, callbacks=[callbacks], verbose=2)
    return model
def test(model, outcome_key):
    """tests the model"""
    loss, accuracy = model.evaluate(testing_images, testing_labels, verbose=0)
    outcomes[outcome_key] = (loss, accuracy)
    print(f"Testing: Loss={loss}, Accuracy: {accuracy}")

    print("\nTesting A Prediction")
    classifications = model.predict(testing_images)
    index = random.randrange(len(classifications))
    selected = classifications[index]
    print(selected)

    print(f"expected label: {testing_labels[index]}")
    print(f"actual label: {selected.argmax()}")
    return

Trying Some Models

128 Nodes

model = train()
outcomes = {}
test(model, "128 Nodes")
Epoch 1/100
{'loss': 0.2586968289529284, 'acc': 0.92588335}
60000/60000 - 2s - loss: 0.2587 - acc: 0.9259
Epoch 2/100
{'loss': 0.11452680859503647, 'acc': 0.9655833}
60000/60000 - 2s - loss: 0.1145 - acc: 0.9656
Epoch 3/100
{'loss': 0.0795439642144988, 'acc': 0.97606665}
60000/60000 - 2s - loss: 0.0795 - acc: 0.9761
Epoch 4/100
{'loss': 0.05808031236998116, 'acc': 0.9816667}
60000/60000 - 2s - loss: 0.0581 - acc: 0.9817
Epoch 5/100
{'loss': 0.04466566459426346, 'acc': 0.98588336}
60000/60000 - 2s - loss: 0.0447 - acc: 0.9859
Epoch 6/100
{'loss': 0.03590909656824855, 'acc': 0.9885333}
60000/60000 - 2s - loss: 0.0359 - acc: 0.9885
Epoch 7/100
{'loss': 0.02741284582785641, 'acc': 0.9912}
Stopping point reached at epoch 6
Model Accuracy: None
60000/60000 - 2s - loss: 0.0274 - acc: 0.9912
Testing: Loss=15.376291691160201, Accuracy: 0.9764000177383423

Testing A Prediction
[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
expected label: 0
actual label: 0

Well, here we can see why the Fashion MNIST data set was created, even with this simple network we were able to reach our goal in 7 epochs. Even the testing accuracy and loss was pretty good.

End

Source