Convolutional Neural Networks and Fashion MNIST


The goal of this exercise is to create a model that can classify the Fashion MNIST data better than our previous single hidden-layer model.



import matplotlib.pyplot as pyplot
import numpy
import seaborn
import tensorflow

My Stuff

from graeae.timers import Timer

Set Up

The Timer

TIMER = Timer()

The Data

(training_images, training_labels), (testing_images, testing_labels) = (

training_images = training_images / 255
testing_images = testing_images / 255



Some Exploratory Work

A Baseline Model

Our baseline that we want to beat is a model with a single dense hidden layer with 128 nodes.

model = tensorflow.keras.models.Sequential()
model.add(tensorflow.keras.layers.Dense(128, activation=tensorflow.nn.relu))
model.add(tensorflow.keras.layers.Dense(10, activation=tensorflow.nn.softmax))

model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", 
with TIMER:, training_labels, epochs=10, verbose=2)
loss, accuracy = model.evaluate(testing_images, testing_labels, verbose=0)
print(f"Testing Loss: {loss:.2f} Testing Accuracy: {accuracy: .2f}")
Epoch 1/10
60000/60000 - 5s - loss: 0.4973 - acc: 0.8252
Epoch 2/10
60000/60000 - 5s - loss: 0.3742 - acc: 0.8656
Epoch 3/10
60000/60000 - 5s - loss: 0.3382 - acc: 0.8775
Epoch 4/10
60000/60000 - 5s - loss: 0.3146 - acc: 0.8839
Epoch 5/10
60000/60000 - 4s - loss: 0.2976 - acc: 0.8897
Epoch 6/10
60000/60000 - 4s - loss: 0.2818 - acc: 0.8963
Epoch 7/10
60000/60000 - 4s - loss: 0.2707 - acc: 0.9002
Epoch 8/10
60000/60000 - 5s - loss: 0.2597 - acc: 0.9039
Epoch 9/10
60000/60000 - 5s - loss: 0.2502 - acc: 0.9066
Epoch 10/10
60000/60000 - 5s - loss: 0.2409 - acc: 0.9094
Testing Loss: 0.36 Testing Accuracy:  0.87

A Convolutional Neural Network

The convolutional layer expects a single tensor instead of a feed of many of them so you need to reshape the input to make it work.

training_images = training_images.reshape(60000, 28, 28, 1)
testing_images = testing_images.reshape(10000, 28, 28, 1)

Our model starts with a Conv2D layer. The arguments we're using are:

  • filters: the dimensionality of the output space (the number of output filters in the convolution)
  • kernel_size: The height and width of the convolution window
  • activation: The activation function for the output
  • input_shape: If this is the first layer in the model you have to tell it what the input shape is

The output of the convolutional layers go to a MaxPool2D layer. The only argument we're passing in is pool_size, the factors by which to downsize the input. Using (2, 2) will reduce the size in half. After the convolutions and pooling are applied, the output is sent through a version of the fully-connected network that we were using before (see the baseline model above).

  • A Model Builder

    Something to make it a little easier to re-use things. Note that in the original notebook the first example has 64 filters in the CNN, but later it says that it's better to start with 32 (and the exercises expect that you used 32) so I'm using that as the default value.

    def get_stop(loss=0.02):
        class Stop(tensorflow.keras.callbacks.Callback):
            def on_epoch_end(self, epoch, logs={}):
                if (logs.get("loss") < loss):
                    print(f"Stopping point reached at epoch {epoch}")
                    self.model.stop_training = True
        stop = Stop()
        return stop
    class ModelBuilder:
        """Builds, trains, and tests our model
         training_images: images to train on
         training_labels: labels for the training data
         testing_images: images to test the trained model with
         testing_labels: labels for the testing data
         additional_convolutions: convolutions besides the input convolution
         epochs: number of times to repeat training
         filters: number of filters in the output of the convolutional layers
         use_callback: use the Stop to end trainig
         callback_loss: loss to use for the callback
        def __init__(self, training_images: numpy.ndarray=training_images,
                     training_labels: numpy.ndarray=training_labels,
                     testing_images: numpy.ndarray=testing_images,
                     testing_labels: numpy.ndarray=testing_labels,
                     additional_convolutions: int=1, 
                     epochs: int=10, 
                     filters: int=32,
                     use_callback: bool=False,
                     callback_loss: float=0) -> None:
            self.training_images = training_images
            self.training_labels = training_labels
            self.testing_images = testing_images
            self.testing_labels = testing_labels
            self.additional_convolutions = additional_convolutions
            self.epochs = epochs
            self.filters = filters
            self.use_callback = use_callback
            self.callback_loss = callback_loss
            self._model = None
            self._callback = None
        def callback(self) -> Stop:
            """The callback to stop the training"""
            if self._callback is None:
                self._callback = get_stop(self.callback_loss)
            return self._callback
        def model(self) -> tensorflow.keras.models.Sequential:
            """Our CNN Model"""
            if self._model is None:
                self._model = tensorflow.keras.models.Sequential()
                    self.filters, (3, 3), 
                    input_shape=(28, 28, 1)))
                self._model.add(tensorflow.keras.layers.MaxPooling2D(2, 2))
                for convolution in range(self.additional_convolutions):
                    self._model.add(tensorflow.keras.layers.Conv2D(self.filters, (3, 3), 
                    self._model.add(tensorflow.keras.layers.MaxPooling2D(2, 2))
                self._model.add(tensorflow.keras.layers.Dense(128, activation="relu"))
                self._model.add(tensorflow.keras.layers.Dense(10, activation="softmax"))
                self._model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", 
            return self._model
        def print_summary(self):
            """Print out the summary for the model"""
        def fit(self):
           Fit the model to the training data
            if self.use_callback:
      , self.training_labels, 
                               epochs=self.epochs, verbose=2, 
      , self.training_labels, 
                               epochs=self.epochs, verbose=2)
        def test(self) -> tuple:
            """Check the loss and accuracy of the model against the testing set
            (loss, accuracy): the output of the evaluation of the testing data
            return self.model.evaluate(self.testing_images, self.testing_labels, verbose=0)
        def __call__(self):
            """Builds and tests the model"""
            loss, accuracy = self.test()
            print(f"Testing Loss: {loss:.2f}  Testing Accuracy: {accuracy:.2f}")
    # model = create_model()
    builder = ModelBuilder(epochs=5)
    Model: "sequential_17"
    Layer (type)                 Output Shape              Param #   
    conv2d_32 (Conv2D)           (None, 26, 26, 32)        320       
    max_pooling2d_32 (MaxPooling (None, 13, 13, 32)        0         
    conv2d_33 (Conv2D)           (None, 11, 11, 32)        9248      
    max_pooling2d_33 (MaxPooling (None, 5, 5, 32)          0         
    flatten_17 (Flatten)         (None, 800)               0         
    dense_34 (Dense)             (None, 128)               102528    
    dense_35 (Dense)             (None, 10)                1290      
    Total params: 113,386
    Trainable params: 113,386
    Non-trainable params: 0

Layer By Layer

  • Our input is a set of 28 x 28 images.
  • Because we didn't pad the images, the convolutional layer "trims" off one row and column on each side (the center cell can't reach the outermost cells) so we get a 26 x 26 grid with 64 filters (which is what we set up in the definition).
  • The Max Pooling layer the halves the image so we have 13 x 13 grid with 64 filters
  • The next convolution layer once again trims off one row on each side so we have a 11 x 11 grid with 64 filters
  • Then the Max Pooling halves the grid once again so we have a 5 x 5 grid with 64 filters
  • The Flatten layer outputs a vector with 1,600 cells (5 x 5 x 64 = 1,600).
  • The first Dense layer has 128 neurons in it so that's the size of the output
  • And the final Dense layer converts it to 10 outputs to match the number of labels we have
Epoch 1/5
60000/60000 - 17s - loss: 0.4671 - acc: 0.8290
Epoch 2/5
60000/60000 - 17s - loss: 0.3149 - acc: 0.8844
Epoch 3/5
60000/60000 - 17s - loss: 0.2688 - acc: 0.9003
Epoch 4/5
60000/60000 - 17s - loss: 0.2414 - acc: 0.9112
Epoch 5/5
60000/60000 - 17s - loss: 0.2175 - acc: 0.9198
Testing Loss: 0.28  Testing Accuracy: 0.89

Using the Convolutional Neural Network we've gone from 88% to 91% accuracy.

10 Epochs

Using five epochs it appears that the loss is still going down while the accuracy is going up. What happens with ten epochs?

builder_10 = ModelBuilder(epochs=10)
Epoch 1/10
60000/60000 - 16s - loss: 0.4807 - acc: 0.8242
Epoch 2/10
60000/60000 - 16s - loss: 0.3233 - acc: 0.8825
Epoch 3/10
60000/60000 - 15s - loss: 0.2776 - acc: 0.8976
Epoch 4/10
60000/60000 - 16s - loss: 0.2474 - acc: 0.9082
Epoch 5/10
60000/60000 - 16s - loss: 0.2273 - acc: 0.9155
Epoch 6/10
60000/60000 - 16s - loss: 0.2030 - acc: 0.9240
Epoch 7/10
60000/60000 - 16s - loss: 0.1854 - acc: 0.9314
Epoch 8/10
60000/60000 - 16s - loss: 0.1693 - acc: 0.9361
Epoch 9/10
60000/60000 - 15s - loss: 0.1540 - acc: 0.9419
Epoch 10/10
60000/60000 - 16s - loss: 0.1419 - acc: 0.9467
Testing Loss: 0.26  Testing Accuracy: 0.91

It looks like it's still learning.

15 Epochs

builder_15 = ModelBuilder(epochs=15)
Epoch 1/15
60000/60000 - 16s - loss: 0.4754 - acc: 0.8260
Epoch 2/15
60000/60000 - 16s - loss: 0.3155 - acc: 0.8834
Epoch 3/15
60000/60000 - 16s - loss: 0.2725 - acc: 0.9001
Epoch 4/15
60000/60000 - 16s - loss: 0.2447 - acc: 0.9096
Epoch 5/15
60000/60000 - 16s - loss: 0.2199 - acc: 0.9180
Epoch 6/15
60000/60000 - 16s - loss: 0.1996 - acc: 0.9248
Epoch 7/15
60000/60000 - 16s - loss: 0.1813 - acc: 0.9316
Epoch 8/15
60000/60000 - 16s - loss: 0.1666 - acc: 0.9372
Epoch 9/15
60000/60000 - 16s - loss: 0.1525 - acc: 0.9430
Epoch 10/15
60000/60000 - 15s - loss: 0.1374 - acc: 0.9484
Epoch 11/15
60000/60000 - 16s - loss: 0.1257 - acc: 0.9527
Epoch 12/15
60000/60000 - 15s - loss: 0.1135 - acc: 0.9569
Epoch 13/15
60000/60000 - 16s - loss: 0.1025 - acc: 0.9615
Epoch 14/15
60000/60000 - 15s - loss: 0.0937 - acc: 0.9647
Epoch 15/15
60000/60000 - 16s - loss: 0.0849 - acc: 0.9682
Testing Loss: 0.34  Testing Accuracy: 0.91

It looks like it's started to overfit, the accuracy is okay, but the loss is a little worse.

20 Epochs

builder = ModelBuilder(epochs=20)
Epoch 1/20
60000/60000 - 16s - loss: 0.4759 - acc: 0.8264
Epoch 2/20
60000/60000 - 16s - loss: 0.3218 - acc: 0.8822
Epoch 3/20
60000/60000 - 16s - loss: 0.2767 - acc: 0.8982
Epoch 4/20
60000/60000 - 16s - loss: 0.2469 - acc: 0.9083
Epoch 5/20
60000/60000 - 16s - loss: 0.2218 - acc: 0.9177
Epoch 6/20
60000/60000 - 16s - loss: 0.2015 - acc: 0.9244
Epoch 7/20
60000/60000 - 16s - loss: 0.1848 - acc: 0.9309
Epoch 8/20
60000/60000 - 15s - loss: 0.1698 - acc: 0.9361
Epoch 9/20
60000/60000 - 14s - loss: 0.1525 - acc: 0.9424
Epoch 10/20
60000/60000 - 15s - loss: 0.1435 - acc: 0.9457
Epoch 11/20
60000/60000 - 16s - loss: 0.1306 - acc: 0.9504
Epoch 12/20
60000/60000 - 15s - loss: 0.1172 - acc: 0.9556
Epoch 13/20
60000/60000 - 15s - loss: 0.1079 - acc: 0.9594
Epoch 14/20
60000/60000 - 15s - loss: 0.0993 - acc: 0.9626
Epoch 15/20
60000/60000 - 15s - loss: 0.0900 - acc: 0.9658
Epoch 16/20
60000/60000 - 15s - loss: 0.0829 - acc: 0.9686
Epoch 17/20
60000/60000 - 15s - loss: 0.0746 - acc: 0.9720
Epoch 18/20
60000/60000 - 16s - loss: 0.0713 - acc: 0.9736
Epoch 19/20
60000/60000 - 15s - loss: 0.0638 - acc: 0.9760
Epoch 20/20
60000/60000 - 15s - loss: 0.0594 - acc: 0.9781
Testing Loss: 0.45  Testing Accuracy: 0.91

It looks like it might be overfitting - both the loss and the accuracy went down a little.

Visualizing the Convolutions and Pooling

[9 2 1 1 6 1 4 6 5 7 4 5 7 3 4 1 2 4 8 0 2 5 7 9 1 4 6 0 9 3 8 8 3 3 8 0 7
 5 7 9 6 1 3 7 6 7 2 1 2 2 4 4 5 8 2 2 8 4 8 0 7 7 8 5 1 1 2 3 9 8 7 0 2 6
 2 3 1 2 8 4 1 8 5 9 5 0 3 2 0 6 5 3 6 7 1 8 0 1 4 2]
model = builder_10.model
figure, axis_array = pyplot.subplots(3,4)

layer_outputs = [layer.output for layer in model.layers]

activation_model = tensorflow.keras.models.Model(inputs = model.input, outputs = layer_outputs)

for x in range(0,4):
  f1 = activation_model.predict(testing_images[FIRST_IMAGE].reshape(1, 28, 28, 1))[x]
  axis_array[0,x].imshow(f1[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
  f2 = activation_model.predict(testing_images[SECOND_IMAGE].reshape(1, 28, 28, 1))[x]
  axis_array[1,x].imshow(f2[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
  f3 = activation_model.predict(testing_images[THIRD_IMAGE].reshape(1, 28, 28, 1))[x]
  axis_array[2,x].imshow(f3[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')



1. Try editing the convolutions. Change the 32s to either 16 or 64. What impact will this have on accuracy and/or training time.

  • 16 Nodes
    builder = ModelBuilder(filters=16)
    with TIMER:
    Epoch 1/10
    60000/60000 - 17s - loss: 0.5169 - acc: 0.8100
    Epoch 2/10
    60000/60000 - 17s - loss: 0.3536 - acc: 0.8714
    Epoch 3/10
    60000/60000 - 17s - loss: 0.3075 - acc: 0.8873
    Epoch 4/10
    60000/60000 - 17s - loss: 0.2808 - acc: 0.8959
    Epoch 5/10
    60000/60000 - 16s - loss: 0.2590 - acc: 0.9027
    Epoch 6/10
    60000/60000 - 17s - loss: 0.2419 - acc: 0.9100
    Epoch 7/10
    60000/60000 - 17s - loss: 0.2276 - acc: 0.9156
    Epoch 8/10
    60000/60000 - 17s - loss: 0.2140 - acc: 0.9182
    Epoch 9/10
    60000/60000 - 17s - loss: 0.2030 - acc: 0.9233
    Epoch 10/10
    60000/60000 - 17s - loss: 0.1934 - acc: 0.9266
    Testing Loss: 0.29  Testing Accuracy: 0.90

    The smaller model had slightly more loss than the 32 node model as well as a little less accuracy.

  • 64 Nodes
    builder = ModelBuilder(filters=64)
    with TIMER:
    Epoch 1/10
    60000/60000 - 19s - loss: 0.4367 - acc: 0.8428
    Epoch 2/10
    60000/60000 - 18s - loss: 0.2923 - acc: 0.8929
    Epoch 3/10
    60000/60000 - 18s - loss: 0.2472 - acc: 0.9087
    Epoch 4/10
    60000/60000 - 18s - loss: 0.2156 - acc: 0.9205
    Epoch 5/10
    60000/60000 - 18s - loss: 0.1893 - acc: 0.9298
    Epoch 6/10
    60000/60000 - 18s - loss: 0.1665 - acc: 0.9380
    Epoch 7/10
    60000/60000 - 18s - loss: 0.1460 - acc: 0.9456
    Epoch 8/10
    60000/60000 - 18s - loss: 0.1285 - acc: 0.9500
    Epoch 9/10
    60000/60000 - 18s - loss: 0.1142 - acc: 0.9568
    Epoch 10/10
    60000/60000 - 18s - loss: 0.0972 - acc: 0.9621
    Testing Loss: 0.32  Testing Accuracy: 0.91

    This has the same accuracy as the 32 node model but with a slight increase in the loss.

2. Remove the final Convolution. What impact will this have on accuracy or training time?

builder = ModelBuilder(additional_convolutions=0)
with TIMER:
Epoch 1/10
60000/60000 - 14s - loss: 0.3897 - acc: 0.8607
Epoch 2/10
60000/60000 - 14s - loss: 0.2642 - acc: 0.9042
Epoch 3/10
60000/60000 - 14s - loss: 0.2218 - acc: 0.9187
Epoch 4/10
60000/60000 - 14s - loss: 0.1883 - acc: 0.9306
Epoch 5/10
60000/60000 - 14s - loss: 0.1619 - acc: 0.9391
Epoch 6/10
60000/60000 - 14s - loss: 0.1387 - acc: 0.9482
Epoch 7/10
60000/60000 - 14s - loss: 0.1171 - acc: 0.9564
Epoch 8/10
60000/60000 - 14s - loss: 0.1000 - acc: 0.9629
Epoch 9/10
60000/60000 - 14s - loss: 0.0831 - acc: 0.9702
Epoch 10/10
60000/60000 - 14s - loss: 0.0728 - acc: 0.9729
Testing Loss: 0.31  Testing Accuracy: 0.92

Once again the accuracy is a little better than the 32 node model but the testing loss is also a little higher. We probably need more data.

3. How about adding more Convolutions? What impact do you think this will have? Experiment with it.

4. In the previous lesson you implemented a callback to check on the loss function and to cancel training once it hit a certain amount. See if you can implement that here!

builder = ModelBuilder(use_callback=True, epochs=100, callback_loss=0.19)
with TIMER:
Epoch 1/100
60000/60000 - 17s - loss: 0.4773 - acc: 0.8277
Epoch 2/100
60000/60000 - 17s - loss: 0.3204 - acc: 0.8840
Epoch 3/100
60000/60000 - 17s - loss: 0.2777 - acc: 0.8986
Epoch 4/100
60000/60000 - 17s - loss: 0.2463 - acc: 0.9089
Epoch 5/100
60000/60000 - 17s - loss: 0.2220 - acc: 0.9179
Epoch 6/100
60000/60000 - 17s - loss: 0.2029 - acc: 0.9250
Epoch 7/100
Stopping point reached at epoch 6
60000/60000 - 17s - loss: 0.1827 - acc: 0.9314
Testing Loss: 0.25  Testing Accuracy: 0.91

This does about the same as the 10 epoch version, so we didn't save much, but it gives us a way to stop without guessing the number of epochs.

