Part 8 - Transfer Learning
Table of Contents
Introduction
This is from Udacity's Deep Learning Repository which supports their Deep Learning Nanodegree.
In this notebook, you'll learn how to use pre-trained networks to solved challenging problems in computer vision. Specifically, you'll use networks trained on ImageNet (available from torchvision).
ImageNet is a massive dataset with over 1 million labeled images in 1000 categories. It's used to train deep neural networks using an architecture called convolutional layers. I'm not going to get into the details of convolutional networks here, but if you want to learn more about them, please watch this.
Once trained, these models work astonishingly well as feature detectors for images they weren't trained on. Using a pre-trained network on images not in the training set is called transfer learning. Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.
With torchvision.models
you can download these pre-trained networks and use them in your applications. We'll include models
in our imports now.
Set Up
Imports
Python
from collections import OrderedDict
from datetime import datetime
PyPi
from torch import nn
from torch import optim
from torchvision import datasets, transforms, models
import torch
import torch.nn.functional as F
This Project
from neurotic.tangles.data_paths import DataPathTwo
from neurotic.models.fashion import (
test_only,
train_only,
)
Dotenv
For some reason dotenv has stopped working unless it's called in the notebook. Maybe this will fix it
The Data
Most of the pretrained models require the input to be 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are [0.485, 0.456, 0.406]
and the standard deviations are [0.229, 0.224, 0.225]
.
means = [0.485, 0.456, 0.406]
deviations = [0.229, 0.224, 0.225]
train_transforms = transforms.Compose([transforms.RandomRotation(30),
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(means,
deviations)])
test_transforms = transforms.Compose([transforms.Resize(255),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(means,
deviations)])
Load the Data
We're going to load the Cat-Dog data set again.
train_path = DataPathTwo(folder_key="CAT_DOG_TRAIN")
test_path = DataPathTwo(folder_key="CAT_DOG_TEST")
train_data = datasets.ImageFolder(train_path.folder, transform=train_transforms)
test_data = datasets.ImageFolder(test_path.folder, transform=test_transforms)
train_batches = torch.utils.data.DataLoader(train_data, batch_size=64,
shuffle=True)
test_batches = torch.utils.data.DataLoader(test_data, batch_size=64)
The DenseNet Model
We are going to load the DenseNet model.
model = models.densenet121(pretrained=True)
print(model)
DenseNet( (features): Sequential( (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu0): ReLU(inplace) (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (denseblock1): _DenseBlock( (denselayer1): _DenseLayer( (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer2): _DenseLayer( (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(96, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer3): _DenseLayer( (norm1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer4): _DenseLayer( (norm1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(160, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer5): _DenseLayer( (norm1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer6): _DenseLayer( (norm1): BatchNorm2d(224, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(224, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) ) (transition1): _Transition( (norm): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace) (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (pool): AvgPool2d(kernel_size=2, stride=2, padding=0) ) (denseblock2): _DenseBlock( (denselayer1): _DenseLayer( (norm1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer2): _DenseLayer( (norm1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(160, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer3): _DenseLayer( (norm1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer4): _DenseLayer( (norm1): BatchNorm2d(224, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(224, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer5): _DenseLayer( (norm1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer6): _DenseLayer( (norm1): BatchNorm2d(288, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(288, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer7): _DenseLayer( (norm1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(320, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer8): _DenseLayer( (norm1): BatchNorm2d(352, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(352, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer9): _DenseLayer( (norm1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(384, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer10): _DenseLayer( (norm1): BatchNorm2d(416, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(416, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer11): _DenseLayer( (norm1): BatchNorm2d(448, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(448, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer12): _DenseLayer( (norm1): BatchNorm2d(480, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(480, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) ) (transition2): _Transition( (norm): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace) (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (pool): AvgPool2d(kernel_size=2, stride=2, padding=0) ) (denseblock3): _DenseBlock( (denselayer1): _DenseLayer( (norm1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer2): _DenseLayer( (norm1): BatchNorm2d(288, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(288, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer3): _DenseLayer( (norm1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(320, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer4): _DenseLayer( (norm1): BatchNorm2d(352, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(352, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer5): _DenseLayer( (norm1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(384, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer6): _DenseLayer( (norm1): BatchNorm2d(416, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(416, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer7): _DenseLayer( (norm1): BatchNorm2d(448, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(448, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer8): _DenseLayer( (norm1): BatchNorm2d(480, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(480, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer9): _DenseLayer( (norm1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer10): _DenseLayer( (norm1): BatchNorm2d(544, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(544, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer11): _DenseLayer( (norm1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer12): _DenseLayer( (norm1): BatchNorm2d(608, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(608, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer13): _DenseLayer( (norm1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(640, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer14): _DenseLayer( (norm1): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(672, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer15): _DenseLayer( (norm1): BatchNorm2d(704, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(704, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer16): _DenseLayer( (norm1): BatchNorm2d(736, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(736, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer17): _DenseLayer( (norm1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer18): _DenseLayer( (norm1): BatchNorm2d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(800, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer19): _DenseLayer( (norm1): BatchNorm2d(832, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer20): _DenseLayer( (norm1): BatchNorm2d(864, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(864, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer21): _DenseLayer( (norm1): BatchNorm2d(896, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(896, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer22): _DenseLayer( (norm1): BatchNorm2d(928, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(928, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer23): _DenseLayer( (norm1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(960, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer24): _DenseLayer( (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) ) (transition3): _Transition( (norm): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace) (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (pool): AvgPool2d(kernel_size=2, stride=2, padding=0) ) (denseblock4): _DenseBlock( (denselayer1): _DenseLayer( (norm1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer2): _DenseLayer( (norm1): BatchNorm2d(544, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(544, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer3): _DenseLayer( (norm1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer4): _DenseLayer( (norm1): BatchNorm2d(608, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(608, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer5): _DenseLayer( (norm1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(640, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer6): _DenseLayer( (norm1): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(672, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer7): _DenseLayer( (norm1): BatchNorm2d(704, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(704, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer8): _DenseLayer( (norm1): BatchNorm2d(736, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(736, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer9): _DenseLayer( (norm1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer10): _DenseLayer( (norm1): BatchNorm2d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(800, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer11): _DenseLayer( (norm1): BatchNorm2d(832, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer12): _DenseLayer( (norm1): BatchNorm2d(864, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(864, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer13): _DenseLayer( (norm1): BatchNorm2d(896, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(896, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer14): _DenseLayer( (norm1): BatchNorm2d(928, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(928, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer15): _DenseLayer( (norm1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(960, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer16): _DenseLayer( (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace) (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) ) (norm5): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (classifier): Linear(in_features=1024, out_features=1000, bias=True) )
This model is built out of two main parts, the features and the classifier. The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The classifier part is a single fully-connected layer (classifier): Linear(in_features=1024, out_features=1000)
. This layer was trained on the ImageNet dataset, so it won't work for our specific problem. That means we need to replace the classifier, but the features will work perfectly on their own. In general, I think about pre-trained networks as amazingly good feature detectors that can be used as the input for simple feed-forward classifiers.
Next we want to freeze the parameters so we don't backprop through them.
for param in model.parameters():
param.requires_grad = False
And now we build our classifier model.
classifier = nn.Sequential(OrderedDict([
('fc1', nn.Linear(1024, 500)),
('relu', nn.ReLU()),
('fc2', nn.Linear(500, 2)),
('output', nn.LogSoftmax(dim=1))
]))
Using CUDA
With our model built, we need to train the classifier. However, now we're using a really deep neural network. If you try to train this on a CPU like normal, it will take a long, long time. Instead, we're going to use the GPU to do the calculations. The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds. It's also possible to train on multiple GPUs, further decreasing training time.
PyTorch, along with pretty much every other deep learning framework, uses CUDA to efficiently compute the forward and backwards passes on the GPU. In PyTorch, you move your model parameters and other tensors to the GPU memory using model.to('cuda')
. You can move them back from the GPU with model.to('cpu')
which you'll commonly do when you need to operate on the network output outside of PyTorch. As a demonstration of the increased speed, I'll compare how long it takes to perform a forward and backward pass with and without a GPU.
criterion = nn.NLLLoss()
device = "cpu"
# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.classifier.parameters(), lr=0.003)
model.to(device)
for index, (inputs, labels) in enumerate(train_batches):
# Move input and label tensors to the GPU
inputs, labels = inputs.to(device), labels.to(device)
start = datetime.now()
outputs = model.forward(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
if index==3:
break
print("Device = {}; Time per batch: {} seconds".format(
device, (datetime.now() - start)/3
))
Device = cpu; Time per batch: 0:00:12.372973 seconds
device = "cuda"
criterion = nn.NLLLoss()
# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.classifier.parameters(), lr=0.003)
model.to(device)
for index, (inputs, labels) in enumerate(train_batches):
# Move input and label tensors to the GPU
inputs, labels = inputs.to(device), labels.to(device)
start = datetime.now()
outputs = model.forward(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
if index==3:
break
print("Device = {}; Time per batch: {} seconds".format(
device, (datetime.now() - start)/3
))
Device = cuda; Time per batch: 0:00:00.008037 seconds
So, it takes less than a second compared to 12 seconds. Interestingly, I kept getting a CUDA out of memory error when I had seaborn and matplotlib imported at the top. I don't know what the conflict is, but it's something to watch out for.
You can write device agnostic code which will automatically use CUDA if it's enabled like so at the beginning of your code:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
Then whenever you get a new Tensor or Module it won't copy if they are already on the desired device (it will just return the original object).
input = data.to(device)
model = MyModule(...).to(device)
First a short test to make sure this works.
train_iter = iter(train_batches)
train_small = [train_iter.next() for item in range(2)]
test_iter = iter(test_batches)
test_small = [test_iter.next() for item in range(2)]
outcome = train(model, optimizer, criterion, train_small, test_small, epochs=1, device="cuda")
Epoch: 1/30 Training loss: 0.43 Test Loss: 2.63 Test Accuracy: 0.56
Train the Model
Okay, so now for a long one. Time to get some coffee.
Setup CUDA If It's Available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
The Training
%time
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)
model.to(device)
outcome = train_only(model, optimizer, criterion, train_batches,
epochs=30, device=device)
torch.save(model.state_dict(), "cat_dog_model.pth")
The Accuracy
test_loss = 0
accuracy = 0
accuracies = []
test_losses = []
with torch.no_grad():
for inputs, labels in test_batches:
inputs, labels = inputs.to(device), labels.to(device)
output = model(inputs)
test_loss += criterion(output, labels).item()
probabilities = torch.exp(output)
top_p, top_class = probabilities.topk(1, dim=1)
equals = top_class == labels.view(*top_class.shape)
accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
mean_accuracy = accuracy/len(test_batches)
test_losses.append(test_loss/len(test_batches))
accuracies.append(mean_accuracy)
print("Final Loss: {:.2f}".format(test_losses[-1]))
print("Final Accuracy: {:.2f}".format(accuracies[-1]))
Final Loss: 1.22 Final Accuracy: 0.64
So still not quite good enough.
Train Some More
outcome = train_only(model, optimizer, criterion, train_batches,
epochs=10, device=device)
torch.save(model.state_dict(), "cat_dog_model.pth")
test_outcome = test_only(model, criterion, test_batches, devicej)
print(test_outcome.iloc[-1])
Test Loss 1.532174 Test Accuracy 0.630859 Name: 39, dtype: float64
So, it hasn't actually gotten better, if anything it got worse. Does this mean it's overfitting?
Another Model
I peeked at the solution notebook and it has fewer nodes in the first linear layer and adds dropout. Interestingly the lecture has more nodes in the first layer, but I'll try fewer first.
The Classifier
classifier = nn.Sequential(OrderedDict([
("fully_connected_layer", nn.Linear(1024, 256)),
('relu', nn.ReLU()),
("dropout", nn.Dropout(p=0.2)),
('fully_connected_2', nn.Linear(256, 2)),
('output', nn.LogSoftmax(dim=1))
]))
model.classifier = classifier
model.to(device)
Note that I had to do the model.to(device)
call again since I added the classifier. I think I could also have done classifier.to(device)
, but this seemed to work.
More Parallelization
I noticed on the pytorch data parallelization tutorial that they said you need to tell pytorch to use more than one GPU (if you want it to) so I'm going to try and add it here.
if torch.cuda.device_count() > 1:
print("Using {} GPUs".format(torch.cuda.device_count()))
model = nn.DataParallel(model)
model.to(device)
else:
print("Only 1 GPU available")
Only 1 GPU available
Oh, well.
The Criterion and Optimizer
The other notebook also used a slightly higher learning rate which I'll copy. It also managed to get 95% with one epoch, which is totally out of whack with what I'm seeing. I'll try it again.
LEARNING_RATE = 0.003
EPOCHS = 1
Our loss and optimizer.
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.classifier.parameters(), lr=LEARNING_RATE)
Now train on one epoch.
start = datetime.now()
outcome = train_only(model, optimizer, criterion, train_batches,
epochs=EPOCHS, device=device)
torch.save(model.state_dict(), "cat_dog_model.pth")
print("Training Time: {}".format(datetime.now() - start))
Training Time: 0:06:28.712052
start = datetime.now()
test_outcome = test_only(model, test_batches, device)
print("Test Time: {}".format(datetime.now() - start))
Test Time: 0:00:42.637106
print(test_outcome)
0.9776
Okay, so I changed the test_only function to use model.eval
instead of model.no_grad
like we were doing before and it went from 51% to 98%. Hmm…