Simple Autoencoder
Table of Contents
We'll start off by building a simple autoencoder to compress the MNIST dataset. With autoencoders, we pass input data through an encoder that makes a compressed representation of the input. Then, this representation is passed through a decoder to reconstruct the input data. Generally the encoder and decoder will be built with neural networks, then trained on example data.
Compressed Representation
A compressed representation can be great for saving and sharing any kind of data in a way that is more efficient than storing raw data. In practice, the compressed representation often holds key information about an input image and we can use it for denoising images or other kinds of reconstruction and transformation!
Set Up
In this notebook, we'll be build a simple network architecture for the encoder and decoder. Let's get started by importing our libraries and getting the dataset.
from dotenv import load_dotenv
from torchvision import datasets
import matplotlib.pyplot as pyplot
import numpy
import seaborn
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
This Project
from neurotic.tangles.data_paths import DataPathTwo
get_ipython().run_line_magic('matplotlib', 'inline')
get_ipython().run_line_magic('config', "InlineBackend.figure_format = 'retina'")
rc={"axes.grid": False,
"": ["sans-serif"],
"font.sans-serif": ["Open Sans", "Latin Modern Sans", "Lato"],
"figure.figsize": (8, 6)},
The Data
Data Transformer
transform = transforms.ToTensor()
Load the Data
path = DataPathTwo(folder_key="MNIST")
train_data = datasets.MNIST(root=path.folder, train=True,
download=True, transform=transform)
test_data = datasets.MNIST(root=path.folder, train=False,
download=True, transform=transform)
Training and Test Batch Loaders
- Some Constants
# number of subprocesses to use for data loading NUM_WORKERS = 0 # how many samples per batch to load BATCH_SIZE = 20
Prepare the loaders.
train_loader =, batch_size=BATCH_SIZE, num_workers=NUM_WORKERS) test_loader =, batch_size=BATCH_SIZE, num_workers=NUM_WORKERS)
Visualize the Data
Obtain One Batch of Training Images
dataiter = iter(train_loader)
images, labels =
images = images.numpy()
Get One Image From the Batch
img = numpy.squeeze(images[0])
figure, axe = pyplot.subplots()
figure.suptitle("First Image", weight="bold")
image = axe.imshow(img, cmap='gray')
Linear Autoencoder
We'll train an autoencoder with these images by flattening them into 784 length vectors. The images from this dataset are already normalized such that the values are between 0 and 1. Let's start by building a simple autoencoder. The encoder and decoder should be made of one linear layer. The units that connect the encoder and decoder will be the compressed representation.
Since the images are normalized between 0 and 1, we need to use a sigmoid activation on the output layer to get values that match this input value range.
- The input images will be flattened into 784 length vectors. The targets are the same as the inputs.
- The encoder and decoder will be made of two linear layers, each.
- The depth dimensions should change as follows: 784 inputs > encoding_dim > 784 outputs.
- All layers will have ReLu activations applied except for the final output layer, which has a sigmoid activation.
The compressed representation should be a vector with dimension encoding_dim=32
Architecture Definition
rows, columns = img.shape
IMAGE_DIMENSION = rows * columns
class Autoencoder(nn.Module):
""""" simple autoencoder-decoder
encoding_dim: the dimension of the encoded image
def __init__(self, encoding_dim:int):
self.encoder = nn.Linear(IMAGE_DIMENSION, encoding_dim)
self.activation_one = nn.ReLU()
self.decoder = nn.Linear(encoding_dim, IMAGE_DIMENSION)
self.activation_output = nn.Sigmoid()
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Does one feed-forward pass
x: flattened MNIST image
the encoded-decoded version of the image
x = self.activation_one(self.encoder(x))
return self.activation_output(self.decoder(x))
Initialize the Auto-Encoder
encoding_dim = 32
model = Autoencoder(encoding_dim)
Autoencoder( (encoder): Linear(in_features=784, out_features=32, bias=True) (activation_one): ReLU() (decoder): Linear(in_features=32, out_features=784, bias=True) (activation_output): Sigmoid() )
Here I'll write a bit of code to train the network. I'm not too interested in validation here, so I'll just monitor the training loss and the test loss afterwards.
We are not concerned with labels in this case, just images, which we can get from the train_loader
. Because we're comparing pixel values in input and output images, it will be best to use a loss that is meant for a regression task. Regression is all about comparing quantities rather than probabilistic values. So, in this case, I'll use MSELoss
, which calculates the Mean-Squared Error between the predicted and the actual value, and compare output images and input images as follows:
loss = criterion(outputs, images)
Otherwise, this is pretty straightfoward training with PyTorch. We flatten our images, pass them into the autoencoder, and record the training loss as we go.
Specify the Loss Function
criterion = nn.MSELoss()
Specifiy the Optimizer
We're going to use the Adam optimizer instead of Stochastic Gradient Descent.
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
And Now We Train
n_epochs = 20
for epoch in range(1, n_epochs+1):
# monitor training loss
train_loss = 0.0
# train the model #
for data in train_loader:
# _ stands in for labels, here
images, _ = data
# flatten images
images = images.view(images.size(0), -1)
# clear the gradients of all optimized variables
# forward pass: compute predicted outputs by passing inputs to the model
outputs = model(images)
# calculate the loss
loss = criterion(outputs, images)
# backward pass: compute gradient of the loss with respect to model parameters
# perform a single optimization step (parameter update)
# update running training loss
train_loss += loss.item()*images.size(0)
# print avg training statistics
train_loss = train_loss/len(train_loader)
print('Epoch: {} \tTraining Loss: {:.6f}'.format(
Epoch: 1 Training Loss: 0.622334 Epoch: 2 Training Loss: 0.297601 Epoch: 3 Training Loss: 0.258895 Epoch: 4 Training Loss: 0.250710 Epoch: 5 Training Loss: 0.247124 Epoch: 6 Training Loss: 0.244808 Epoch: 7 Training Loss: 0.243222 Epoch: 8 Training Loss: 0.242119 Epoch: 9 Training Loss: 0.241254 Epoch: 10 Training Loss: 0.240563 Epoch: 11 Training Loss: 0.239997 Epoch: 12 Training Loss: 0.239529 Epoch: 13 Training Loss: 0.239120 Epoch: 14 Training Loss: 0.238747 Epoch: 15 Training Loss: 0.238395 Epoch: 16 Training Loss: 0.238030 Epoch: 17 Training Loss: 0.237546 Epoch: 18 Training Loss: 0.237213 Epoch: 19 Training Loss: 0.236916 Epoch: 20 Training Loss: 0.236473
Checking out the results
Below I've plotted some of the test images along with their reconstructions. For the most part these look pretty good except for some blurriness in some parts.
Obtain One Batch Of Test Images
dataiter = iter(test_loader)
images, labels =
images_flatten = images.view(images.size(0), -1)
# get sample outputs
output = model(images_flatten)
# prep images for display
images = images.numpy()
# output is resized into a batch of images
output = output.view(BATCH_SIZE, 1, 28, 28)
# use detach when it's an output that requires_grad
output = output.detach().numpy()
figure, axes = pyplot.subplots(nrows=2, ncols=10, sharex=True, sharey=True, figsize=(10,8))
# input images on top row, reconstructions on bottom
for images, row in zip([images, output], axes):
for img, ax in zip(images, row):
ax.imshow(numpy.squeeze(img), cmap='gray')