CNN GAN

Deep Convolutional GAN (DCGAN)

We're going to build a Generative Adversarial Network to generate handwritten digits. Instead of using fully-connected layers we'll use Convolutional layers.

Here are the main features of a DCGAN.

  • Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
  • Use BatchNorm in both the generator and the discriminator.
  • Remove fully connected hidden layers for deeper architectures.
  • ReLU activation in generator for all layers except for the output, which uses Tanh.
  • Use LeakyReLU activation in the discriminator for all layers.

Imports

# python
from collections import namedtuple
from functools import partial
from pathlib import Path

# conda
from torch import nn
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import MNIST
from torchvision.utils import make_grid

import holoviews
import hvplot.pandas
import matplotlib.pyplot as pyplot
import pandas
import torch
# my stuff
from graeae import EmbedHoloviews, Timer

Set Up

The Random Seed

torch.manual_seed(0)

Plotting and Timing

TIMER = Timer()
slug = "cnn-gan"

Embed = partial(EmbedHoloviews, folder_path=f"files/posts/gans/{slug}")

Plot = namedtuple("Plot", ["width", "height", "fontscale", "tan", "blue", "red"])
PLOT = Plot(
    width=900,
    height=750,
    fontscale=2,
    tan="#ddb377",
    blue="#4687b7",
    red="#ce7b6d",
 )

Helper Functions

A Plotter

def plot_image(image: torch.Tensor,
                filename: str,
                title: str,
                num_images: int=25,
                size: tuple=(1, 28, 28),
                folder: str=f"files/posts/gans/{slug}/") -> None:
    """Plot the image and save it

    Args:
     image: the tensor with the image to plot
     filename: name for the final image file
     title: title to put on top of the image
     num_images: how many images to put in the composite image
     size: the size for the image
     folder: sub-folder to save the file in
    """
    unflattened_image = image.detach().cpu().view(-1, *size)
    image_grid = make_grid(unflattened_image[:num_images], nrow=5)

    pyplot.title(title)
    pyplot.grid(False)
    pyplot.imshow(image_grid.permute(1, 2, 0).squeeze())

    pyplot.tick_params(bottom=False, top=False, labelbottom=False,
                       right=False, left=False, labelleft=False)
    pyplot.savefig(folder + filename)
    print(f"[[file:{filename}]]")
    return

A Noise Maker

def make_some_noise(n_samples: int, z_dim: int, device: str="cpu") -> torch.Tensor:
    """create noise vectors

    creates 
    Args:
       n_samples: the number of samples to generate, a scalar
       z_dim: the dimension of the noise vector, a scalar
       device: the device type (cpu or cuda)

    Returns:
     tensor with random numbers from the normal distribution.
    """

    return torch.randn(n_samples, z_dim, device=device)

Middle

The Generator

The first component you will make is the generator. You may notice that instead of passing in the image dimension, you will pass the number of image channels to the generator. This is because with DCGAN, you use convolutions which don’t depend on the number of pixels on an image. However, the number of channels is important to determine the size of the filters.

You will build a generator using 4 layers (3 hidden layers + 1 output layer). As before, you will need to write a function to create a single block for the generator's neural network. From the paper:

  • [u]se batchnorm in both the generator and the discriminator"
  • [u]se ReLU activation in generator for all layers except for the output, which uses Tanh.

Since in DCGAN the activation function will be different for the output layer, you will need to check what layer is being created.

At the end of the generator class, you are given a forward pass function that takes in a noise vector and generates an image of the output dimension using your neural network. You are also given a function to create a noise vector. These functions are the same as the ones from the last assignment.

See also:

The Generator Class

class Generator(nn.Module):
    """The DCGAN Generator

    Args:
       z_dim: the dimension of the noise vector
       im_chan: the number of channels in the images, fitted for the dataset used
             (MNIST is black-and-white, so 1 channel is your default)
       hidden_dim: the inner dimension,
    """
    def __init__(self, z_dim: int=10, im_chan: int=1, hidden_dim: int=64):
        super().__init__()
        self.z_dim = z_dim
        # Build the neural network
        self.gen = nn.Sequential(
            self.make_gen_block(z_dim, hidden_dim * 4),
            self.make_gen_block(hidden_dim * 4, hidden_dim * 2, kernel_size=4, stride=1),
            self.make_gen_block(hidden_dim * 2, hidden_dim),
            self.make_gen_block(hidden_dim, im_chan, kernel_size=4, final_layer=True),
        )

    def make_gen_block(self, input_channels: int, output_channels: int,
                       kernel_size: int=3, stride: int=2,
                       final_layer: bool=False) -> nn.Sequential:
        """Creates a block for the generator (sub sequence)

       The parts
        - a transposed convolution
        - a batchnorm (except for in the last layer)
        - an activation.

       Args:
           input_channels: how many channels the input feature representation has
           output_channels: how many channels the output feature representation should have
           kernel_size: the size of each convolutional filter, equivalent to (kernel_size, kernel_size)
           stride: the stride of the convolution
           final_layer: a boolean, true if it is the final layer and false otherwise 
                     (affects activation and batchnorm)

       Returns:
        the sub-sequence of layers
       """
        if not final_layer:
            return nn.Sequential(
                nn.ConvTranspose2d(input_channels, output_channels, kernel_size, stride),
                nn.BatchNorm2d(output_channels),
                nn.ReLU()
            )
        else: # Final Layer
            return nn.Sequential(
                nn.ConvTranspose2d(input_channels, output_channels, kernel_size, stride),
                nn.Tanh()
            )

    def unsqueeze_noise(self, noise: torch.Tensor) -> torch.Tensor:
        """transforms the noise tensor

       Args:
           noise: a noise tensor with dimensions (n_samples, z_dim)

       Returns:
        copy of noise with width and height = 1 and channels = z_dim.
       """
        return noise.view(len(noise), self.z_dim, 1, 1)

    def forward(self, noise: torch.Tensor) -> torch.Tensor:
        """complete a forward pass of the generator: Given a noise tensor, 

       Args:
        noise: a noise tensor with dimensions (n_samples, z_dim)

       Returns:
        generated images.
       """
        x = self.unsqueeze_noise(noise)
        return self.gen(x)

Setup Testing

gen = Generator()
num_test = 100

# Test the hidden block
test_hidden_noise = make_some_noise(num_test, gen.z_dim)
test_hidden_block = gen.make_gen_block(10, 20, kernel_size=4, stride=1)
test_uns_noise = gen.unsqueeze_noise(test_hidden_noise)
hidden_output = test_hidden_block(test_uns_noise)

# Check that it works with other strides
test_hidden_block_stride = gen.make_gen_block(20, 20, kernel_size=4, stride=2)

test_final_noise = make_some_noise(num_test, gen.z_dim) * 20
test_final_block = gen.make_gen_block(10, 20, final_layer=True)
test_final_uns_noise = gen.unsqueeze_noise(test_final_noise)
final_output = test_final_block(test_final_uns_noise)

# Test the whole thing:
test_gen_noise = make_some_noise(num_test, gen.z_dim)
test_uns_gen_noise = gen.unsqueeze_noise(test_gen_noise)
gen_output = gen(test_uns_gen_noise)

Unit Tests

assert tuple(hidden_output.shape) == (num_test, 20, 4, 4)
assert hidden_output.max() > 1
assert hidden_output.min() == 0
assert hidden_output.std() > 0.2
assert hidden_output.std() < 1
assert hidden_output.std() > 0.5

assert tuple(test_hidden_block_stride(hidden_output).shape) == (num_test, 20, 10, 10)

assert final_output.max().item() == 1
assert final_output.min().item() == -1

assert tuple(gen_output.shape) == (num_test, 1, 28, 28)
assert gen_output.std() > 0.5
assert gen_output.std() < 0.8
print("Success!")

The Discriminator

The second component you need to create is the discriminator.

You will use 3 layers in your discriminator's neural network. Like with the generator, you will need to create the method to create a single neural network block for the discriminator.

From the paper:

  • [u]se LeakyReLU activation in the discriminator for all layers.
  • For the LeakyReLUs, "the slope of the leak was set to 0.2" in DCGAN.

See Also:

The Discriminator Class

class Discriminator(nn.Module):
    """The DCGAN Discriminator

    Args:
     im_chan: the number of channels in the images, fitted for the dataset used
             (MNIST is black-and-white, so 1 channel is the default)
     hidden_dim: the inner dimension,
    """
    def __init__(self, im_chan: int=1, hidden_dim: int=16):
        super(Discriminator, self).__init__()
        self.disc = nn.Sequential(
            self.make_disc_block(im_chan, hidden_dim),
            self.make_disc_block(hidden_dim, hidden_dim * 2),
            self.make_disc_block(hidden_dim * 2, 1, final_layer=True),
        )
        return

    def make_disc_block(self, input_channels: int, output_channels: int,
                        kernel_size: int=4, stride: int=2,
                        final_layer: bool=False) -> nn.Sequential:
        """Make a sub-block of layers for the discriminator

        - a convolution
        - a batchnorm (except for in the last layer)
        - an activation.

       Args:
         input_channels: how many channels the input feature representation has
         output_channels: how many channels the output feature representation should have
         kernel_size: the size of each convolutional filter, equivalent to (kernel_size, kernel_size)
         stride: the stride of the convolution
         final_layer: if true it is the final layer and otherwise not
                     (affects activation and batchnorm)
       """        
        # Build the neural block
        if not final_layer:
            return nn.Sequential(
                nn.Conv2d(input_channels, output_channels, kernel_size, stride),
                nn.BatchNorm2d(output_channels),
                nn.LeakyReLU(0.2)
            )
        else: # Final Layer
            return nn.Sequential(
                nn.Conv2d(input_channels, output_channels, kernel_size, stride),
            )

    def forward(self, image: torch.Tensor) -> torch.Tensor:
        """Complete a forward pass of the discriminator

       Args:
         image: a flattened image tensor with dimension (im_dim)

       Returns:
        a 1-dimension tensor representing fake/real.
       """
        disc_pred = self.disc(image)
        return disc_pred.view(len(disc_pred), -1)

Set Up Testing

num_test = 100

gen = Generator()
disc = Discriminator()
test_images = gen(make_some_noise(num_test, gen.z_dim))

# Test the hidden block
test_hidden_block = disc.make_disc_block(1, 5, kernel_size=6, stride=3)
hidden_output = test_hidden_block(test_images)

# Test the final block
test_final_block = disc.make_disc_block(1, 10, kernel_size=2, stride=5, final_layer=True)
final_output = test_final_block(test_images)

# Test the whole thing:
disc_output = disc(test_images)

Unit Testing

  • The Hidden Block
    assert tuple(hidden_output.shape) == (num_test, 5, 8, 8)
    # Because of the LeakyReLU slope
    assert -hidden_output.min() / hidden_output.max() > 0.15
    assert -hidden_output.min() / hidden_output.max() < 0.25
    assert hidden_output.std() > 0.5
    assert hidden_output.std() < 1
    
  • The Final Block
    assert tuple(final_output.shape) == (num_test, 10, 6, 6)
    assert final_output.max() > 1.0
    assert final_output.min() < -1.0
    assert final_output.std() > 0.3
    assert final_output.std() < 0.6
    
  • The Whole Thing
    assert tuple(disc_output.shape) == (num_test, 1)
    assert disc_output.std() > 0.25
    assert disc_output.std() < 0.5
    print("Success!")
    

Training The Model

Remember that these are your parameters:

  • criterion: the loss function
  • n_epochs: the number of times you iterate through the entire dataset when training
  • z_dim: the dimension of the noise vector
  • display_step: how often to display/visualize the images
  • batch_size: the number of images per forward/backward pass
  • lr: the learning rate
  • beta_1, beta_2: the momentum term
  • device: the device type

Set Up The Data

criterion = nn.BCEWithLogitsLoss()
z_dim = 64
batch_size = 128
# A learning rate of 0.0002 works well on DCGAN
lr = 0.0002

# These parameters control the optimizer's momentum, which you can read more about here:
# https://distill.pub/2017/momentum/ but you don’t need to worry about it for this course!
beta_1 = 0.5 
beta_2 = 0.999
device = 'cuda'

# You can tranform the image values to be between -1 and 1 (the range of the tanh activation)
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,)),
])

path = Path("~/pytorch-data/MNIST").expanduser()
dataloader = DataLoader(
    MNIST(path, download=True, transform=transform),
    batch_size=batch_size,
    shuffle=True)

Set Up the GAN

gen = Generator(z_dim).to(device)
gen_opt = torch.optim.Adam(gen.parameters(), lr=lr, betas=(beta_1, beta_2))
disc = Discriminator().to(device) 
disc_opt = torch.optim.Adam(disc.parameters(), lr=lr, betas=(beta_1, beta_2))

A Weight Initializer

def initial_weights(m):
    """Initialize the weights to the normal distribution

     - mean 0
     - standard deviation 0.02

    Args:
     m: layer whose weights to initialize
    """
    if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
        torch.nn.init.normal_(m.weight, 0.0, 0.02)
    if isinstance(m, nn.BatchNorm2d):
        torch.nn.init.normal_(m.weight, 0.0, 0.02)
        torch.nn.init.constant_(m.bias, 0)
    return
gen = gen.apply(initial_weights)
disc = disc.apply(initial_weights)

Train it

For each epoch, you will process the entire dataset in batches. For every batch, you will update the discriminator and generator. Then, you can see DCGAN's results!

Here's roughly the progression you should be expecting. On GPU this takes about 30 seconds per thousand steps. On CPU, this can take about 8 hours per thousand steps. You might notice that in the image of Step 5000, the generator is disproprotionately producing things that look like ones. If the discriminator didn't learn to detect this imbalance quickly enough, then the generator could just produce more ones. As a result, it may have ended up tricking the discriminator so well that there would be no more improvement, known as mode collapse.

n_epochs = 100
cur_step = 0
display_step = 1000
mean_generator_loss = 0
mean_discriminator_loss = 0
generator_losses = []
discriminator_losses = []
steps = []

best_loss = float("inf")
best_step = 0
best_path = Path("~/models/gans/mnist-dcgan/best_model.pth").expanduser()

with TIMER:
    for epoch in range(n_epochs):
        # Dataloader returns the batches
        for real, _ in dataloader:
            cur_batch_size = len(real)
            real = real.to(device)

            ## Update discriminator ##
            disc_opt.zero_grad()
            fake_noise = make_some_noise(cur_batch_size, z_dim, device=device)
            fake = gen(fake_noise)
            disc_fake_pred = disc(fake.detach())
            disc_fake_loss = criterion(disc_fake_pred, torch.zeros_like(disc_fake_pred))
            disc_real_pred = disc(real)
            disc_real_loss = criterion(disc_real_pred, torch.ones_like(disc_real_pred))
            disc_loss = (disc_fake_loss + disc_real_loss) / 2

            # Keep track of the average discriminator loss
            mean_discriminator_loss += disc_loss.item() / display_step
            # Update gradients
            disc_loss.backward(retain_graph=True)
            # Update optimizer
            disc_opt.step()

            ## Update generator ##
            gen_opt.zero_grad()
            fake_noise_2 = make_some_noise(cur_batch_size, z_dim, device=device)
            fake_2 = gen(fake_noise_2)
            disc_fake_pred = disc(fake_2)
            gen_loss = criterion(disc_fake_pred, torch.ones_like(disc_fake_pred))
            gen_loss.backward()
            gen_opt.step()

            # Keep track of the average generator loss
            mean_generator_loss += gen_loss.item() / display_step
            if mean_generator_loss < best_loss:
                best_loss, best_step = mean_generator_loss, cur_step
                with best_path.open("wb") as writer:
                    torch.save(gen, writer)
            ## Visualization code ##
            if cur_step % display_step == 0 and cur_step > 0:
                print(f"Epoch {epoch}, step {cur_step}: Generator loss:"
                        f" {mean_generator_loss}, discriminator loss:"
                        f" {mean_discriminator_loss}")

                steps.append(cur_step)
                generator_losses.append(mean_generator_loss)
                discriminator_losses.append(mean_discriminator_loss)

                mean_generator_loss = 0
                mean_discriminator_loss = 0
            cur_step += 1
Started: 2021-04-21 12:45:12.452739
Epoch 2, step 1000: Generator loss: 1.2671969079673289, discriminator loss: 0.43014343224465823
Epoch 4, step 2000: Generator loss: 1.1353899730443968, discriminator loss: 0.5306872705817226
Epoch 6, step 3000: Generator loss: 0.8764803466945883, discriminator loss: 0.611450107574464
Epoch 8, step 4000: Generator loss: 0.7747784045338618, discriminator loss: 0.6631499938964849
Epoch 10, step 5000: Generator loss: 0.7640163034200661, discriminator loss: 0.6734729865789411
Epoch 12, step 6000: Generator loss: 0.7452541967928404, discriminator loss: 0.6805261079072958
Epoch 14, step 7000: Generator loss: 0.7337032879889016, discriminator loss: 0.6874966211915009
Epoch 17, step 8000: Generator loss: 0.7245009585618979, discriminator loss: 0.6908933531045917
Epoch 19, step 9000: Generator loss: 0.7180560626983646, discriminator loss: 0.6936621717810626
Epoch 21, step 10000: Generator loss: 0.7115822317004211, discriminator loss: 0.695760274052621
Epoch 23, step 11000: Generator loss: 0.7090291924774644, discriminator loss: 0.6962701203227039
Epoch 25, step 12000: Generator loss: 0.7059894913136957, discriminator loss: 0.6973492541313167
Epoch 27, step 13000: Generator loss: 0.7030480077862743, discriminator loss: 0.6978999735713001
Epoch 29, step 14000: Generator loss: 0.7028095332086096, discriminator loss: 0.6974007876515396
Epoch 31, step 15000: Generator loss: 0.7027116653919212, discriminator loss: 0.6965595571994787
Epoch 34, step 16000: Generator loss: 0.7005282629728309, discriminator loss: 0.6962912415862079
Epoch 36, step 17000: Generator loss: 0.7007142878770828, discriminator loss: 0.6961965024471283
Epoch 38, step 18000: Generator loss: 0.699474583208561, discriminator loss: 0.6952810400128371
Epoch 40, step 19000: Generator loss: 0.6989677719473828, discriminator loss: 0.6954642050266268
Epoch 42, step 20000: Generator loss: 0.6977452509403238, discriminator loss: 0.695180906951427
Epoch 44, step 21000: Generator loss: 0.6973587237596515, discriminator loss: 0.6950308464765543
Epoch 46, step 22000: Generator loss: 0.6960379970669743, discriminator loss: 0.6949119175076485
Epoch 49, step 23000: Generator loss: 0.6957966268062581, discriminator loss: 0.6948324624896048
Epoch 51, step 24000: Generator loss: 0.6958502059578898, discriminator loss: 0.6945331234931943
Epoch 53, step 25000: Generator loss: 0.6954856168627734, discriminator loss: 0.6943869084119801
Epoch 55, step 26000: Generator loss: 0.6957543395757682, discriminator loss: 0.694317172288894
Epoch 57, step 27000: Generator loss: 0.6947923063635825, discriminator loss: 0.694082073867321
Epoch 59, step 28000: Generator loss: 0.6945026598572728, discriminator loss: 0.6939926172494871
Epoch 61, step 29000: Generator loss: 0.6947789136767392, discriminator loss: 0.6938506522774704
Epoch 63, step 30000: Generator loss: 0.6946699734926227, discriminator loss: 0.6937169924378406
Epoch 66, step 31000: Generator loss: 0.6944284628629694, discriminator loss: 0.6936815274357805
Epoch 68, step 32000: Generator loss: 0.6940396347641948, discriminator loss: 0.6935891906023032
Epoch 70, step 33000: Generator loss: 0.6946771386265761, discriminator loss: 0.6937210547327995
Epoch 72, step 34000: Generator loss: 0.693429798424244, discriminator loss: 0.6937174627780922
Epoch 74, step 35000: Generator loss: 0.6937471128702157, discriminator loss: 0.6935204346776015
Epoch 76, step 36000: Generator loss: 0.6938841561675072, discriminator loss: 0.6934832554459566
Epoch 78, step 37000: Generator loss: 0.6934520475268362, discriminator loss: 0.6934578058719627
Epoch 81, step 38000: Generator loss: 0.6936635475754732, discriminator loss: 0.6934186050295835
Epoch 83, step 39000: Generator loss: 0.6936795052289972, discriminator loss: 0.6935187472105031
Epoch 85, step 40000: Generator loss: 0.6933113215565679, discriminator loss: 0.6933534587025645
Epoch 87, step 41000: Generator loss: 0.6934976277351385, discriminator loss: 0.6933284662365923
Epoch 89, step 42000: Generator loss: 0.6933313971757892, discriminator loss: 0.693348657488824
Epoch 91, step 43000: Generator loss: 0.6937436528205883, discriminator loss: 0.6933502901792529
Epoch 93, step 44000: Generator loss: 0.6943431540131578, discriminator loss: 0.6933887023925772
Epoch 95, step 45000: Generator loss: 0.6938722513914105, discriminator loss: 0.6932663491368296
Epoch 98, step 46000: Generator loss: 0.6933276618123067, discriminator loss: 0.6934270900487906
Ended: 2021-04-21 13:06:00.256725
Elapsed: 0:20:47.803986

Looking at the Final model.

fake_noise = make_some_noise(cur_batch_size, z_dim, device=device)

best_model = torch.load(best_path)
fake = best_model(fake_noise)
plot_image(image=fake, filename="fake_digits.png", title="Fake Digits")

fake_digits.png

plot_image(real, filename="real_digits.png", title="Real Digits")

real_digits.png

plotting = pandas.DataFrame.from_dict({
    "Step": steps,
    "Generator Loss": generator_losses,
    "Discriminator Loss": discriminator_losses
})

best = plotting.iloc[plotting["Generator Loss"].argmin()]
best_line = holoviews.VLine(best.Step)
gen_plot = plotting.hvplot(x="Step", y="Generator Loss", color=PLOT.blue)
disc_plot = plotting.hvplot(x="Step", y="Discriminator Loss", color=PLOT.red)

plot = (gen_plot * disc_plot * best_line).opts(title="Training Losses",
                                               height=PLOT.height,
                                               width=PLOT.width,
                                               ylabel="Loss",
                                               fontscale=PLOT.fontscale)
output = Embed(plot=plot, file_name="losses")()
print(output)

Figure Missing

End

Sources

  • Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. 2015 Nov 19. (PDF)