Pytorch 60 Minute Blitz

The Departure

This is a replication of Deep Learning With Pytorch: A 60 Minute Blitz to get me back into using PyTorch.

Imports

PyPi

Although the project is called PyTorch, the package is named torch.

import torch
import torch.nn as neural_network
import torch.nn.functional as functional

And we're going to use numpy a little.

import numpy

The Initiation

What is PyTorch?

Tensors

In PyTorch, tensors are similar to numpy's ndarrays (n-dimensional arrays). You can create an unitialized one using the empty function.

  • Empty
    empty_tensor = torch.empty(5, 3)
    print(empty_tensor)
    
    tensor([[-2.3492e+02,  4.5902e-41, -2.3492e+02],
            [ 4.5902e-41,  3.1766e+30,  1.7035e+25],
            [ 4.0498e-43,  0.0000e+00, -2.3492e+02],
            [ 4.5902e-41,  2.6417e-37,  0.0000e+00],
            [ 1.4607e-19,  1.8469e+25,  1.0901e+27]])
    

    Here's the docstring for empty:

    print(torch.empty.__doc__)
    
    
    empty(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) -> Tensor
    
    Returns a tensor filled with uninitialized data. The shape of the tensor is
    defined by the variable argument :attr:`sizes`.
    
    Args:
        sizes (int...): a sequence of integers defining the shape of the output tensor.
            Can be a variable number of arguments or a collection like a list or tuple.
        out (Tensor, optional): the output tensor
        dtype (:class:`torch.dtype`, optional): the desired data type of returned tensor.
            Default: if ``None``, uses a global default (see :func:`torch.set_default_tensor_type`).
        layout (:class:`torch.layout`, optional): the desired layout of returned Tensor.
            Default: ``torch.strided``.
        device (:class:`torch.device`, optional): the desired device of returned tensor.
            Default: if ``None``, uses the current device for the default tensor type
            (see :func:`torch.set_default_tensor_type`). :attr:`device` will be the CPU
            for CPU tensor types and the current CUDA device for CUDA tensor types.
        requires_grad (bool, optional): If autograd should record operations on the
            returned tensor. Default: ``False``.
    
    Example::
    
        >>> torch.empty(2, 3)
        tensor(1.00000e-08 *
               [[ 6.3984,  0.0000,  0.0000],
                [ 0.0000,  0.0000,  0.0000]])
    
    
    
  • Random
    print(torch.rand(5, 3))
    
    tensor([[0.1767, 0.9520, 0.1488],
            [0.5592, 0.4836, 0.2645],
            [0.8066, 0.8864, 0.1083],
            [0.9206, 0.7311, 0.1278],
            [0.0140, 0.5370, 0.3123]])
    

    The arguments are the same as for empty.

  • Zeros

    Here we'll create a tensor of zeros as long integers.

    print(torch.zeros(5, 3, dtype=torch.long))
    
    tensor([[0, 0, 0],
            [0, 0, 0],
            [0, 0, 0],
            [0, 0, 0],
            [0, 0, 0]])
    

    Once again the argument for zeros is the same as those for empty.

  • From Data
    print(torch.tensor([5.5, 3]))
    
    tensor([5.5000, 3.0000])
    
  • From A Tensor

    You can create a new tensor from a previously constructed one. This preserves any parameters you passed in that you don't subsequently override.

    x = torch.tensor([5, 3], dtype=torch.int)
    print(x)
    y = x.new_ones(5, 3)
    print(y)
    
    tensor([5, 3], dtype=torch.int32)
    tensor([[1, 1, 1],
            [1, 1, 1],
            [1, 1, 1],
            [1, 1, 1],
            [1, 1, 1]], dtype=torch.int32)
    

    PyTorch also has another syntax for creating a random tensor from another tensor.

    print(torch.randn_like(x, dtype=torch.float))
    
    tensor([ 0.6447, -0.9750])
    

    So in this case it kept the shape but used our dtype. The values seemed odd at first, but that's because the randn indicates it comes from a standard-normal distribution centered at 0, not some value in the range from zero to one (non-inclusive) like a regular random function would.

  • Tensor Size

    Like pandas, the tensor has a shape, but confusingly it's called Size and can be accessed either from the size method of the shape attribute.

    print(y.size())
    
    torch.Size([5, 3])
    
    print(y.shape)
    
    torch.Size([5, 3])
    
    print(torch.Size.__base__)
    
    <class 'tuple'>
    

    The Size object inherits from tuples and supports all the tuple operations.

Operations

  • Addition

    For some operations you can use either the operators (like +) or method calls. Here's two ways to do addition.

    SIZE = (5, 3)
    x = torch.rand(*SIZE)
    y = torch.rand(*SIZE)
    output = x + y
    print(output)
    print()
    print(torch.add(x, y))
    
    tensor([[0.4370, 1.4905, 0.8806],
            [1.7555, 0.9883, 0.8121],
            [1.1988, 0.6291, 1.2755],
            [1.2424, 1.1548, 1.1025],
            [0.8627, 0.9954, 1.1028]])
    
    tensor([[0.4370, 1.4905, 0.8806],
            [1.7555, 0.9883, 0.8121],
            [1.1988, 0.6291, 1.2755],
            [1.2424, 1.1548, 1.1025],
            [0.8627, 0.9954, 1.1028]])
    
  • Pre-Made Tensors

    One advantage to using the function is that you can pass in a tensor, rather than having pytorch create the output-tensor for you.

    summation = torch.empty(SIZE)
    torch.add(x, y, out=summation)
    print(summation)
    
    tensor([[0.4370, 1.4905, 0.8806],
            [1.7555, 0.9883, 0.8121],
            [1.1988, 0.6291, 1.2755],
            [1.2424, 1.1548, 1.1025],
            [0.8627, 0.9954, 1.1028]])
    
  • In-Place Operations

    Tensors also have methods that let you update them instead of creating a new tensor.

    x.add_(y)
    print(x)
    
    tensor([[0.4370, 1.4905, 0.8806],
            [1.7555, 0.9883, 0.8121],
            [1.1988, 0.6291, 1.2755],
            [1.2424, 1.1548, 1.1025],
            [0.8627, 0.9954, 1.1028]])
    
  • Slicing

    The slicing follows what numpy's arrays do. Here's how to get all the rows of the second column.

    print(x[:, 1])
    
    tensor([1.4905, 0.9883, 0.6291, 1.1548, 0.9954])
    
  • Reshaping

    You can create a new tensor with the same data but a different shape using the view method.

    y = x.view(15)
    z = x.view(-1, 5)
    print(x.shape)
    print(y.shape)
    print(z.shape)
    
    torch.Size([5, 3])
    torch.Size([15])
    torch.Size([3, 5])
    

    Using -1 tells pytorch to infer the dimension based on the original and the dimension that you did pass in.

Torch to Numpy

While there are advantages to using torch for operations (it can use the GPU, for instance), there might be times when you want to convert the tensor to a numpy array.

x = torch.zeros(5)
print(x)
y = x.numpy()
print(y)
x.add_(1)
print(x)
print(y)
print(type(y))
tensor([0., 0., 0., 0., 0.])
[0. 0. 0. 0. 0.]
tensor([1., 1., 1., 1., 1.])
[1. 1. 1. 1. 1.]
<class 'numpy.ndarray'>

Somehow updating the tensor in place updates the numpy array as well, even though it's an ndarray.

Numpy to Torch

You can go the other way as well.

x = numpy.zeros(5)
print(x)
y = torch.from_numpy(x)
print(y)
x += 5
print(y)
[0. 0. 0. 0. 0.]
tensor([0., 0., 0., 0., 0.], dtype=torch.float64)
tensor([5., 5., 5., 5., 5.], dtype=torch.float64)

So updating the array (in place) updates the tensor.

Cuda

As I mentioned before, an advantage of pytorch tensors is that they can be run on the GPU - unfortunately the computer I'm on is old and CUDA doesn't run on it, but we can make a check to see if it will first using =torch.cuda.is_available()

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
print(device)

x = torch.ones(5)

# pass in the device
y = torch.ones_like(x, device=device)

# or move the tensor to the device (not an inplace operation)
x = x.to(device)

z = x + y
print(z)

Autograd: Automatic Differentiation

The autograd module in pytorch performs automatic differentiation for you. It works using define-by-run, meaning that as you run you forward-pass through the network, it tracks your calls so you don't have to explicitly define anything for backpropagation to work. To enable or disable it you set the requires_grad attribute of the tensor you want to train.

tense = torch.ones(2, 2, requires_grad=True)
print(tense)
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

Now if you do a tensor operation:

tensed = tense + 1
print(tensed)
tensor([[2., 2.],
        [2., 2.]], grad_fn=<AddBackward0>)

Our new tensor has a gradient function set for it. If you do more operations on tensed:

tenser = tensed * 5
print(tenser)
tensor([[10., 10.],
        [10., 10.]], grad_fn=<MulBackward0>)
a = torch.ones(5, requires_grad=False)
b = a * 5
a.requires_grad_(True)
c = a * 6
print(b)
print(c)
tensor([5., 5., 5., 5., 5.])
tensor([6., 6., 6., 6., 6.], grad_fn=<MulBackward0>)

Two things to note, one is that the gradient function is only set while the requires_grad attribute is true, the other is that this only works on the leafs in the graph - you can set it on a and b but not c - because since I set requires_grad to True on a, when I created c by multiplying a by 6, c became part of a's graph… I think. Anyway, you can't set it on tensors that are part of the backpropagation path.

Backpropagation

You run back-propagation by calling the backward method on the last tensor in the graph. In our case the last tensor we have (tenser) doesn't output numbers so we need to create a final tensor that does for back-propagation to work.

output = tenser.mean()
output.backward()
print(tense.grad)
tensor([[1.2500, 1.2500],
        [1.2500, 1.2500]])

After one pass through the network (and back) our root-node tensor has some gradients.

Context Manager

If you need to temporarily turn the gradient tracking on or off you can use a context manager.

print((tense*2).requires_grad)
with torch.no_grad():
    print((tense* 2).requires_grad)
print((tense * 2).requires_grad)
True
False
True

Note that the root-will still have require_grad as true, it's the output of operations working with it that don't get the gradient set.

print(tense.requires_grad)
with torch.no_grad():
    print(tense.requires_grad)
print(tense.requires_grad)
True
True
True

Neural Networks

A Typical Model Training Procedure

  1. Define the neural network
  2. Iterate over a dataset of inputs
  3. Process each input through the network
  4. Compute the loss (how much error there is)
  5. Update the weights of the network

The most common way to update the weights is to use a simple formula. \[ weight = weight - textit{learning rate} \times gradient \]

Defining the Network

This will be a network with five layers - two convolutional layers followed by three fully-connected layers. For the convolutional layers we're going to use Max-Pooling and for the fully-connected layers we'll use ReLU activation.

  • The Layers

    You can just create the layers in the constructor, but since I'm trying to re-learn what's going on I'm going to peel it apart a little more.

    The first layer is the input layer, so the inputs have to match whatever data you are going to get. In our case we are going to look at a black and white image so it has one input-channel. The three required arguments to the Conv2d constructor are:

    • in_channels
    • out_channels
    • kernel_size
    class LayerOne:
        inputs = 1
        outputs = 6
        convolution_size = 5
        layer = neural_network.Conv2d(inputs, outputs, convolution_size)
    
    class LayerTwo:
        inputs = LayerOne.outputs
        outputs = 16
        convolution_size = 5
        layer = neural_network.Conv2d(inputs, outputs, convolution_size)
    

    Layer Three is the first Linear layer. Linear layers do a linear transformation on the inputs.

    \[ y = x W^T + b \]

    Where x is the input, W is the weight matrix and b is a bias constant.

    class LayerThree:
        inputs = (LayerTwo.outputs * LayerOne.convolution_size 
                  * LayerTwo.convolution_size)
        outputs = 120
        layer = neural_network.Linear(inputs, outputs)
    
    class LayerFour:
        inputs = LayerThree.outputs
        outputs = 84
        layer = neural_network.Linear(inputs, outputs)
    

    This is the last layer so the outputs are the outputs for the model as a whole.

    class LayerFive:
        inputs = LayerFour.outputs
        outputs = 10
        layer = neural_network.Linear(inputs, outputs)
    

    For the forward-pass our convolutional layers will have their output pooled using max_pool2d and all the layers (except for the output layers) will use relu as the activation function to keep the model from being linear.

    class NeuralNetwork(neural_network.Module):
        """A five-layer Convolutional Neural Network"""
        def __init__(self):
            super().__init__()
            self.layer_one = LayerOne.layer
            self.layer_two = LayerTwo.layer
            self.layer_three = LayerThree.layer
            self.layer_four = LayerFour.layer
            self.layer_five = LayerFive.layer
            return
    
        def flattened_features_counts(self, x):
            sizes = x.size()[1:]
            features = 1
            for size in sizes:
                features *= size
            return features
    
        def forward(self, x):
            """One forward pass through the network
    
            Args:
             x: a one-channel image
    
            Returns:
             a ten-output linear layer
            """
            x = functional.max_pool2d(functional.relu(self.layer_one(x)), (2, 2))
            x = functional.max_pool2d(functional.relu(self.layer_two(x)), 2)
            x = x.view(-1, self.flattened_features_counts(x))
            x = functional.relu(self.layer_three(x))
            x = functional.relu(self.layer_four(x))
            return self.layer_five(x)
    
    model = NeuralNetwork()
    print(model)
    
    NeuralNetwork(
      (layer_one): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
      (layer_two): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
      (layer_three): Linear(in_features=400, out_features=120, bias=True)
      (layer_four): Linear(in_features=120, out_features=84, bias=True)
      (layer_five): Linear(in_features=84, out_features=10, bias=True)
    )
    

    The output shows the parameters for each layer in our model.

    A sample output.

    INPUT_SIZE = 32
    mock_image = torch.randn(1, 1, INPUT_SIZE, INPUT_SIZE)
    output = model(mock_image)
    print(output)
    
    tensor([[ 0.1163,  0.0882,  0.0529,  0.0546, -0.0196, -0.1215, -0.1736,  0.0659,
              0.0762, -0.0093]], grad_fn=<AddmmBackward>)
    

    This is the output after one forward pass. Unfortunately we didn't want to train it on fake data so we should reset it.

    model.zero_grad()
    output.backward(torch.randn(1, 10))
    

The Loss Function

Backpropagation

Update the Weights

Training a Classifier

Data Parallelism

The Return