Convolutional Neural Networks and Fashion MNIST

Beginning

The goal of this exercise is to create a model that can classify the Fashion MNIST data better than our previous single hidden-layer model.

Imports

PyPi

import matplotlib.pyplot as pyplot
import numpy
import seaborn
import tensorflow

My Stuff

from graeae.timers import Timer

Set Up

The Timer

TIMER = Timer()

The Data

(training_images, training_labels), (testing_images, testing_labels) = (
    tensorflow.keras.datasets.fashion_mnist.load_data())

training_images = training_images / 255
testing_images = testing_images / 255

Plotting

Middle

Some Exploratory Work

A Baseline Model

Our baseline that we want to beat is a model with a single dense hidden layer with 128 nodes.

model = tensorflow.keras.models.Sequential()
model.add(tensorflow.keras.layers.Flatten())
model.add(tensorflow.keras.layers.Dense(128, activation=tensorflow.nn.relu))
model.add(tensorflow.keras.layers.Dense(10, activation=tensorflow.nn.softmax))

model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", 
              metrics=["accuracy"])
with TIMER:
    model.fit(training_images, training_labels, epochs=10, verbose=2)
loss, accuracy = model.evaluate(testing_images, testing_labels, verbose=0)
print(f"Testing Loss: {loss:.2f} Testing Accuracy: {accuracy: .2f}")
WARNING: Logging before flag parsing goes to stderr.
W0703 11:50:56.498819 140182964418368 deprecation.py:506] From /home/brunhilde/.virtualenvs/In-Too-Deep/lib/python3.7/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2019-07-03 11:50:56,502 graeae.timers.timer start: Started: 2019-07-03 11:50:56.502607
I0703 11:50:56.502984 140182964418368 timer.py:70] Started: 2019-07-03 11:50:56.502607
Epoch 1/10
60000/60000 - 5s - loss: 0.4973 - acc: 0.8252
Epoch 2/10
60000/60000 - 5s - loss: 0.3742 - acc: 0.8656
Epoch 3/10
60000/60000 - 5s - loss: 0.3382 - acc: 0.8775
Epoch 4/10
60000/60000 - 5s - loss: 0.3146 - acc: 0.8839
Epoch 5/10
60000/60000 - 4s - loss: 0.2976 - acc: 0.8897
Epoch 6/10
60000/60000 - 4s - loss: 0.2818 - acc: 0.8963
Epoch 7/10
60000/60000 - 4s - loss: 0.2707 - acc: 0.9002
Epoch 8/10
60000/60000 - 5s - loss: 0.2597 - acc: 0.9039
Epoch 9/10
60000/60000 - 5s - loss: 0.2502 - acc: 0.9066
Epoch 10/10
60000/60000 - 5s - loss: 0.2409 - acc: 0.9094
2019-07-03 11:51:42,904 graeae.timers.timer end: Ended: 2019-07-03 11:51:42.904683
I0703 11:51:42.904865 140182964418368 timer.py:77] Ended: 2019-07-03 11:51:42.904683
2019-07-03 11:51:42,907 graeae.timers.timer end: Elapsed: 0:00:46.402076
I0703 11:51:42.907317 140182964418368 timer.py:78] Elapsed: 0:00:46.402076
Testing Loss: 0.36 Testing Accuracy:  0.87

A Convolutional Neural Network

The convolutional layer expects a single tensor instead of a feed of many of them so you need to reshape the input to make it work.

training_images = training_images.reshape(60000, 28, 28, 1)
testing_images = testing_images.reshape(10000, 28, 28, 1)

Our model starts with a Conv2D layer. The arguments we're using are:

  • filters: the dimensionality of the output space (the number of output filters in the convolution)
  • kernel_size: The height and width of the convolution window
  • activation: The activation function for the output
  • input_shape: If this is the first layer in the model you have to tell it what the input shape is

The output of the convolutional layers go to a MaxPool2D layer. The only argument we're passing in is pool_size, the factors by which to downsize the input. Using (2, 2) will reduce the size in half. After the convolutions and pooling are applied, the output is sent through a version of the fully-connected network that we were using before (see the baseline model above).

  • A Model Builder

    Something to make it a little easier to re-use things. Note that in the original notebook the first example has 64 filters in the CNN, but later it says that it's better to start with 32 (and the exercises expect that you used 32) so I'm using that as the default value.

    def get_stop(loss=0.02):
        class Stop(tensorflow.keras.callbacks.Callback):
            def on_epoch_end(self, epoch, logs={}):
                if (logs.get("loss") < loss):
                    print(f"Stopping point reached at epoch {epoch}")
                    self.model.stop_training = True
        stop = Stop()
        return stop
    
    class ModelBuilder:
        """Builds, trains, and tests our model
    
        Args:
         training_images: images to train on
         training_labels: labels for the training data
         testing_images: images to test the trained model with
         testing_labels: labels for the testing data
         additional_convolutions: convolutions besides the input convolution
         epochs: number of times to repeat training
         filters: number of filters in the output of the convolutional layers
         use_callback: use the Stop to end trainig
         callback_loss: loss to use for the callback
        """
        def __init__(self, training_images: numpy.ndarray=training_images,
                     training_labels: numpy.ndarray=training_labels,
                     testing_images: numpy.ndarray=testing_images,
                     testing_labels: numpy.ndarray=testing_labels,
                     additional_convolutions: int=1, 
                     epochs: int=10, 
                     filters: int=32,
                     use_callback: bool=False,
                     callback_loss: float=0) -> None:
            self.training_images = training_images
            self.training_labels = training_labels
            self.testing_images = testing_images
            self.testing_labels = testing_labels
    
            self.additional_convolutions = additional_convolutions
            self.epochs = epochs
            self.filters = filters
            self.use_callback = use_callback
            self.callback_loss = callback_loss
            self._model = None
            self._callback = None
            return
    
        @property
        def callback(self) -> Stop:
            """The callback to stop the training"""
            if self._callback is None:
                self._callback = get_stop(self.callback_loss)
            return self._callback
    
        @property
        def model(self) -> tensorflow.keras.models.Sequential:
            """Our CNN Model"""
            if self._model is None:
                self._model = tensorflow.keras.models.Sequential()
                self._model.add(tensorflow.keras.layers.Conv2D(
                    self.filters, (3, 3), 
                    activation="relu", 
                    input_shape=(28, 28, 1)))
                self._model.add(tensorflow.keras.layers.MaxPooling2D(2, 2))
    
                for convolution in range(self.additional_convolutions):
                    self._model.add(tensorflow.keras.layers.Conv2D(self.filters, (3, 3), 
                                                                   activation="relu"))
                    self._model.add(tensorflow.keras.layers.MaxPooling2D(2, 2))
                self._model.add(tensorflow.keras.layers.Flatten())
                self._model.add(tensorflow.keras.layers.Dense(128, activation="relu"))
                self._model.add(tensorflow.keras.layers.Dense(10, activation="softmax"))
                self._model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", 
                                    metrics=["accuracy"])
            return self._model
    
        def print_summary(self):
            """Print out the summary for the model"""
            print(self.model.summary())
            return
    
        def fit(self):
            """
           Fit the model to the training data
           """
            if self.use_callback:
                self.model.fit(self.training_images, self.training_labels, 
                               epochs=self.epochs, verbose=2, 
                               callbacks=[self.callback])
            else:
                self.model.fit(self.training_images, self.training_labels, 
                               epochs=self.epochs, verbose=2)
            return
    
        def test(self) -> tuple:
            """Check the loss and accuracy of the model against the testing set
    
           Returns:
            (loss, accuracy): the output of the evaluation of the testing data
           """
            return self.model.evaluate(self.testing_images, self.testing_labels, verbose=0)
    
        def __call__(self):
            """Builds and tests the model"""
            self.fit()
            loss, accuracy = self.test()
            print(f"Testing Loss: {loss:.2f}  Testing Accuracy: {accuracy:.2f}")
            return
    
    # model = create_model()
    builder = ModelBuilder(epochs=5)
    builder.print_summary()
    
    Model: "sequential_17"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    conv2d_32 (Conv2D)           (None, 26, 26, 32)        320       
    _________________________________________________________________
    max_pooling2d_32 (MaxPooling (None, 13, 13, 32)        0         
    _________________________________________________________________
    conv2d_33 (Conv2D)           (None, 11, 11, 32)        9248      
    _________________________________________________________________
    max_pooling2d_33 (MaxPooling (None, 5, 5, 32)          0         
    _________________________________________________________________
    flatten_17 (Flatten)         (None, 800)               0         
    _________________________________________________________________
    dense_34 (Dense)             (None, 128)               102528    
    _________________________________________________________________
    dense_35 (Dense)             (None, 10)                1290      
    =================================================================
    Total params: 113,386
    Trainable params: 113,386
    Non-trainable params: 0
    _________________________________________________________________
    None
    

Layer By Layer

  • Our input is a set of 28 x 28 images.
  • Because we didn't pad the images, the convolutional layer "trims" off one row and column on each side (the center cell can't reach the outermost cells) so we get a 26 x 26 grid with 64 filters (which is what we set up in the definition).
  • The Max Pooling layer the halves the image so we have 13 x 13 grid with 64 filters
  • The next convolution layer once again trims off one row on each side so we have a 11 x 11 grid with 64 filters
  • Then the Max Pooling halves the grid once again so we have a 5 x 5 grid with 64 filters
  • The Flatten layer outputs a vector with 1,600 cells (5 x 5 x 64 = 1,600).
  • The first Dense layer has 128 neurons in it so that's the size of the output
  • And the final Dense layer converts it to 10 outputs to match the number of labels we have
builder()
Epoch 1/5
60000/60000 - 17s - loss: 0.4671 - acc: 0.8290
Epoch 2/5
60000/60000 - 17s - loss: 0.3149 - acc: 0.8844
Epoch 3/5
60000/60000 - 17s - loss: 0.2688 - acc: 0.9003
Epoch 4/5
60000/60000 - 17s - loss: 0.2414 - acc: 0.9112
Epoch 5/5
60000/60000 - 17s - loss: 0.2175 - acc: 0.9198
Testing Loss: 0.28  Testing Accuracy: 0.89

Using the Convolutional Neural Network we've gone from 88% to 91% accuracy.

10 Epochs

Using five epochs it appears that the loss is still going down while the accuracy is going up. What happens with ten epochs?

builder_10 = ModelBuilder(epochs=10)
builder_10()
Epoch 1/10
60000/60000 - 16s - loss: 0.4807 - acc: 0.8242
Epoch 2/10
60000/60000 - 16s - loss: 0.3233 - acc: 0.8825
Epoch 3/10
60000/60000 - 15s - loss: 0.2776 - acc: 0.8976
Epoch 4/10
60000/60000 - 16s - loss: 0.2474 - acc: 0.9082
Epoch 5/10
60000/60000 - 16s - loss: 0.2273 - acc: 0.9155
Epoch 6/10
60000/60000 - 16s - loss: 0.2030 - acc: 0.9240
Epoch 7/10
60000/60000 - 16s - loss: 0.1854 - acc: 0.9314
Epoch 8/10
60000/60000 - 16s - loss: 0.1693 - acc: 0.9361
Epoch 9/10
60000/60000 - 15s - loss: 0.1540 - acc: 0.9419
Epoch 10/10
60000/60000 - 16s - loss: 0.1419 - acc: 0.9467
Testing Loss: 0.26  Testing Accuracy: 0.91

It looks like it's still learning.

15 Epochs

builder_15 = ModelBuilder(epochs=15)
builder_15()
Epoch 1/15
60000/60000 - 16s - loss: 0.4754 - acc: 0.8260
Epoch 2/15
60000/60000 - 16s - loss: 0.3155 - acc: 0.8834
Epoch 3/15
60000/60000 - 16s - loss: 0.2725 - acc: 0.9001
Epoch 4/15
60000/60000 - 16s - loss: 0.2447 - acc: 0.9096
Epoch 5/15
60000/60000 - 16s - loss: 0.2199 - acc: 0.9180
Epoch 6/15
60000/60000 - 16s - loss: 0.1996 - acc: 0.9248
Epoch 7/15
60000/60000 - 16s - loss: 0.1813 - acc: 0.9316
Epoch 8/15
60000/60000 - 16s - loss: 0.1666 - acc: 0.9372
Epoch 9/15
60000/60000 - 16s - loss: 0.1525 - acc: 0.9430
Epoch 10/15
60000/60000 - 15s - loss: 0.1374 - acc: 0.9484
Epoch 11/15
60000/60000 - 16s - loss: 0.1257 - acc: 0.9527
Epoch 12/15
60000/60000 - 15s - loss: 0.1135 - acc: 0.9569
Epoch 13/15
60000/60000 - 16s - loss: 0.1025 - acc: 0.9615
Epoch 14/15
60000/60000 - 15s - loss: 0.0937 - acc: 0.9647
Epoch 15/15
60000/60000 - 16s - loss: 0.0849 - acc: 0.9682
Testing Loss: 0.34  Testing Accuracy: 0.91

It looks like it's started to overfit, the accuracy is okay, but the loss is a little worse.

20 Epochs

builder = ModelBuilder(epochs=20)
builder()
Epoch 1/20
60000/60000 - 16s - loss: 0.4759 - acc: 0.8264
Epoch 2/20
60000/60000 - 16s - loss: 0.3218 - acc: 0.8822
Epoch 3/20
60000/60000 - 16s - loss: 0.2767 - acc: 0.8982
Epoch 4/20
60000/60000 - 16s - loss: 0.2469 - acc: 0.9083
Epoch 5/20
60000/60000 - 16s - loss: 0.2218 - acc: 0.9177
Epoch 6/20
60000/60000 - 16s - loss: 0.2015 - acc: 0.9244
Epoch 7/20
60000/60000 - 16s - loss: 0.1848 - acc: 0.9309
Epoch 8/20
60000/60000 - 15s - loss: 0.1698 - acc: 0.9361
Epoch 9/20
60000/60000 - 14s - loss: 0.1525 - acc: 0.9424
Epoch 10/20
60000/60000 - 15s - loss: 0.1435 - acc: 0.9457
Epoch 11/20
60000/60000 - 16s - loss: 0.1306 - acc: 0.9504
Epoch 12/20
60000/60000 - 15s - loss: 0.1172 - acc: 0.9556
Epoch 13/20
60000/60000 - 15s - loss: 0.1079 - acc: 0.9594
Epoch 14/20
60000/60000 - 15s - loss: 0.0993 - acc: 0.9626
Epoch 15/20
60000/60000 - 15s - loss: 0.0900 - acc: 0.9658
Epoch 16/20
60000/60000 - 15s - loss: 0.0829 - acc: 0.9686
Epoch 17/20
60000/60000 - 15s - loss: 0.0746 - acc: 0.9720
Epoch 18/20
60000/60000 - 16s - loss: 0.0713 - acc: 0.9736
Epoch 19/20
60000/60000 - 15s - loss: 0.0638 - acc: 0.9760
Epoch 20/20
60000/60000 - 15s - loss: 0.0594 - acc: 0.9781
Testing Loss: 0.45  Testing Accuracy: 0.91

It looks like it might be overfitting - both the loss and the accuracy went down a little.

Visualizing the Convolutions and Pooling

print(testing_labels[:100])
[9 2 1 1 6 1 4 6 5 7 4 5 7 3 4 1 2 4 8 0 2 5 7 9 1 4 6 0 9 3 8 8 3 3 8 0 7
 5 7 9 6 1 3 7 6 7 2 1 2 2 4 4 5 8 2 2 8 4 8 0 7 7 8 5 1 1 2 3 9 8 7 0 2 6
 2 3 1 2 8 4 1 8 5 9 5 0 3 2 0 6 5 3 6 7 1 8 0 1 4 2]
model = builder_10.model
figure, axis_array = pyplot.subplots(3,4)
FIRST_IMAGE=0
SECOND_IMAGE=7
THIRD_IMAGE=26
CONVOLUTION_NUMBER = 1

layer_outputs = [layer.output for layer in model.layers]

activation_model = tensorflow.keras.models.Model(inputs = model.input, outputs = layer_outputs)

for x in range(0,4):
  f1 = activation_model.predict(testing_images[FIRST_IMAGE].reshape(1, 28, 28, 1))[x]
  axis_array[0,x].imshow(f1[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
  axis_array[0,x].grid(False)
  f2 = activation_model.predict(testing_images[SECOND_IMAGE].reshape(1, 28, 28, 1))[x]
  axis_array[1,x].imshow(f2[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
  axis_array[1,x].grid(False)
  f3 = activation_model.predict(testing_images[THIRD_IMAGE].reshape(1, 28, 28, 1))[x]
  axis_array[2,x].imshow(f3[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
  axis_array[2,x].grid(False)

layer_visualization.png

Exercises

1. Try editing the convolutions. Change the 32s to either 16 or 64. What impact will this have on accuracy and/or training time.

  • 16 Nodes
    builder = ModelBuilder(filters=16)
    with TIMER:
        builder()
    
    2019-07-03 12:06:27,700 graeae.timers.timer start: Started: 2019-07-03 12:06:27.700578
    I0703 12:06:27.700625 140182964418368 timer.py:70] Started: 2019-07-03 12:06:27.700578
    Epoch 1/10
    60000/60000 - 17s - loss: 0.5169 - acc: 0.8100
    Epoch 2/10
    60000/60000 - 17s - loss: 0.3536 - acc: 0.8714
    Epoch 3/10
    60000/60000 - 17s - loss: 0.3075 - acc: 0.8873
    Epoch 4/10
    60000/60000 - 17s - loss: 0.2808 - acc: 0.8959
    Epoch 5/10
    60000/60000 - 16s - loss: 0.2590 - acc: 0.9027
    Epoch 6/10
    60000/60000 - 17s - loss: 0.2419 - acc: 0.9100
    Epoch 7/10
    60000/60000 - 17s - loss: 0.2276 - acc: 0.9156
    Epoch 8/10
    60000/60000 - 17s - loss: 0.2140 - acc: 0.9182
    Epoch 9/10
    60000/60000 - 17s - loss: 0.2030 - acc: 0.9233
    Epoch 10/10
    60000/60000 - 17s - loss: 0.1934 - acc: 0.9266
    2019-07-03 12:09:18,226 graeae.timers.timer end: Ended: 2019-07-03 12:09:18.226577
    I0703 12:09:18.226756 140182964418368 timer.py:77] Ended: 2019-07-03 12:09:18.226577
    2019-07-03 12:09:18,229 graeae.timers.timer end: Elapsed: 0:02:50.525999
    I0703 12:09:18.229464 140182964418368 timer.py:78] Elapsed: 0:02:50.525999
    Testing Loss: 0.29  Testing Accuracy: 0.90
    

    The smaller model had slightly more loss than the 32 node model as well as a little less accuracy.

  • 64 Nodes
    builder = ModelBuilder(filters=64)
    with TIMER:
        builder()
    
    2019-07-03 12:09:19,711 graeae.timers.timer start: Started: 2019-07-03 12:09:19.711082
    I0703 12:09:19.711113 140182964418368 timer.py:70] Started: 2019-07-03 12:09:19.711082
    Epoch 1/10
    60000/60000 - 19s - loss: 0.4367 - acc: 0.8428
    Epoch 2/10
    60000/60000 - 18s - loss: 0.2923 - acc: 0.8929
    Epoch 3/10
    60000/60000 - 18s - loss: 0.2472 - acc: 0.9087
    Epoch 4/10
    60000/60000 - 18s - loss: 0.2156 - acc: 0.9205
    Epoch 5/10
    60000/60000 - 18s - loss: 0.1893 - acc: 0.9298
    Epoch 6/10
    60000/60000 - 18s - loss: 0.1665 - acc: 0.9380
    Epoch 7/10
    60000/60000 - 18s - loss: 0.1460 - acc: 0.9456
    Epoch 8/10
    60000/60000 - 18s - loss: 0.1285 - acc: 0.9500
    Epoch 9/10
    60000/60000 - 18s - loss: 0.1142 - acc: 0.9568
    Epoch 10/10
    60000/60000 - 18s - loss: 0.0972 - acc: 0.9621
    2019-07-03 12:12:23,275 graeae.timers.timer end: Ended: 2019-07-03 12:12:23.274851
    I0703 12:12:23.275002 140182964418368 timer.py:77] Ended: 2019-07-03 12:12:23.274851
    2019-07-03 12:12:23,277 graeae.timers.timer end: Elapsed: 0:03:03.563769
    I0703 12:12:23.277686 140182964418368 timer.py:78] Elapsed: 0:03:03.563769
    Testing Loss: 0.32  Testing Accuracy: 0.91
    

    This has the same accuracy as the 32 node model but with a slight increase in the loss.

2. Remove the final Convolution. What impact will this have on accuracy or training time?

builder = ModelBuilder(additional_convolutions=0)
with TIMER:
    builder()
2019-07-03 12:12:24,795 graeae.timers.timer start: Started: 2019-07-03 12:12:24.795249
I0703 12:12:24.795282 140182964418368 timer.py:70] Started: 2019-07-03 12:12:24.795249
Epoch 1/10
60000/60000 - 14s - loss: 0.3897 - acc: 0.8607
Epoch 2/10
60000/60000 - 14s - loss: 0.2642 - acc: 0.9042
Epoch 3/10
60000/60000 - 14s - loss: 0.2218 - acc: 0.9187
Epoch 4/10
60000/60000 - 14s - loss: 0.1883 - acc: 0.9306
Epoch 5/10
60000/60000 - 14s - loss: 0.1619 - acc: 0.9391
Epoch 6/10
60000/60000 - 14s - loss: 0.1387 - acc: 0.9482
Epoch 7/10
60000/60000 - 14s - loss: 0.1171 - acc: 0.9564
Epoch 8/10
60000/60000 - 14s - loss: 0.1000 - acc: 0.9629
Epoch 9/10
60000/60000 - 14s - loss: 0.0831 - acc: 0.9702
Epoch 10/10
60000/60000 - 14s - loss: 0.0728 - acc: 0.9729
2019-07-03 12:14:46,396 graeae.timers.timer end: Ended: 2019-07-03 12:14:46.396417
I0703 12:14:46.396641 140182964418368 timer.py:77] Ended: 2019-07-03 12:14:46.396417
2019-07-03 12:14:46,400 graeae.timers.timer end: Elapsed: 0:02:21.601168
I0703 12:14:46.400143 140182964418368 timer.py:78] Elapsed: 0:02:21.601168
Testing Loss: 0.31  Testing Accuracy: 0.92

Once again the accuracy is a little better than the 32 node model but the testing loss is also a little higher. We probably need more data.

3. How about adding more Convolutions? What impact do you think this will have? Experiment with it.

"results output"body
# Out[21]:

4. In the previous lesson you implemented a callback to check on the loss function and to cancel training once it hit a certain amount. See if you can implement that here!

builder = ModelBuilder(use_callback=True, epochs=100, callback_loss=0.19)
with TIMER:
    builder()
2019-07-03 15:20:50,279 graeae.timers.timer start: Started: 2019-07-03 15:20:50.279833
I0703 15:20:50.279866 140182964418368 timer.py:70] Started: 2019-07-03 15:20:50.279833
Epoch 1/100
60000/60000 - 17s - loss: 0.4773 - acc: 0.8277
Epoch 2/100
60000/60000 - 17s - loss: 0.3204 - acc: 0.8840
Epoch 3/100
60000/60000 - 17s - loss: 0.2777 - acc: 0.8986
Epoch 4/100
60000/60000 - 17s - loss: 0.2463 - acc: 0.9089
Epoch 5/100
60000/60000 - 17s - loss: 0.2220 - acc: 0.9179
Epoch 6/100
60000/60000 - 17s - loss: 0.2029 - acc: 0.9250
Epoch 7/100
Stopping point reached at epoch 6
60000/60000 - 17s - loss: 0.1827 - acc: 0.9314
2019-07-03 15:22:51,538 graeae.timers.timer end: Ended: 2019-07-03 15:22:51.537895
I0703 15:22:51.538049 140182964418368 timer.py:77] Ended: 2019-07-03 15:22:51.537895
2019-07-03 15:22:51,540 graeae.timers.timer end: Elapsed: 0:02:01.258062
I0703 15:22:51.540425 140182964418368 timer.py:78] Elapsed: 0:02:01.258062
Testing Loss: 0.25  Testing Accuracy: 0.91

This does about the same as the 10 epoch version, so we didn't save much, but it gives us a way to stop without guessing the number of epochs.

End

Source