Pytorch 60 Minute Blitz
Table of Contents
The Departure
This is a replication of Deep Learning With Pytorch: A 60 Minute Blitz to get me back into using PyTorch.
Imports
PyPi
Although the project is called PyTorch, the package is named torch
.
import torch
import torch.nn as neural_network
import torch.nn.functional as functional
And we're going to use numpy a little.
import numpy
The Initiation
What is PyTorch?
Tensors
In PyTorch, tensors are similar to numpy's ndarrays (n-dimensional arrays). You can create an unitialized one using the empty
function.
- Empty
empty_tensor = torch.empty(5, 3) print(empty_tensor)
tensor([[-2.3492e+02, 4.5902e-41, -2.3492e+02], [ 4.5902e-41, 3.1766e+30, 1.7035e+25], [ 4.0498e-43, 0.0000e+00, -2.3492e+02], [ 4.5902e-41, 2.6417e-37, 0.0000e+00], [ 1.4607e-19, 1.8469e+25, 1.0901e+27]])
Here's the docstring for
empty
:print(torch.empty.__doc__)
empty(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) -> Tensor Returns a tensor filled with uninitialized data. The shape of the tensor is defined by the variable argument :attr:`sizes`. Args: sizes (int...): a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple. out (Tensor, optional): the output tensor dtype (:class:`torch.dtype`, optional): the desired data type of returned tensor. Default: if ``None``, uses a global default (see :func:`torch.set_default_tensor_type`). layout (:class:`torch.layout`, optional): the desired layout of returned Tensor. Default: ``torch.strided``. device (:class:`torch.device`, optional): the desired device of returned tensor. Default: if ``None``, uses the current device for the default tensor type (see :func:`torch.set_default_tensor_type`). :attr:`device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. requires_grad (bool, optional): If autograd should record operations on the returned tensor. Default: ``False``. Example:: >>> torch.empty(2, 3) tensor(1.00000e-08 * [[ 6.3984, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]])
- Random
print(torch.rand(5, 3))
tensor([[0.1767, 0.9520, 0.1488], [0.5592, 0.4836, 0.2645], [0.8066, 0.8864, 0.1083], [0.9206, 0.7311, 0.1278], [0.0140, 0.5370, 0.3123]])
The arguments are the same as for empty.
- Zeros
Here we'll create a tensor of zeros as long integers.
print(torch.zeros(5, 3, dtype=torch.long))
tensor([[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]])
Once again the argument for
zeros
is the same as those forempty
. - From Data
print(torch.tensor([5.5, 3]))
tensor([5.5000, 3.0000])
- From A Tensor
You can create a new tensor from a previously constructed one. This preserves any parameters you passed in that you don't subsequently override.
x = torch.tensor([5, 3], dtype=torch.int) print(x) y = x.new_ones(5, 3) print(y)
tensor([5, 3], dtype=torch.int32) tensor([[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1]], dtype=torch.int32)
PyTorch also has another syntax for creating a random tensor from another tensor.
print(torch.randn_like(x, dtype=torch.float))
tensor([ 0.6447, -0.9750])
So in this case it kept the shape but used our dtype. The values seemed odd at first, but that's because the
randn
indicates it comes from a standard-normal distribution centered at 0, not some value in the range from zero to one (non-inclusive) like a regular random function would. - Tensor Size
Like pandas, the tensor has a shape, but confusingly it's called
Size
and can be accessed either from thesize
method of theshape
attribute.print(y.size())
torch.Size([5, 3])
print(y.shape)
torch.Size([5, 3])
print(torch.Size.__base__)
<class 'tuple'>
The
Size
object inherits from tuples and supports all the tuple operations.
Operations
- Addition
For some operations you can use either the operators (like
+
) or method calls. Here's two ways to do addition.SIZE = (5, 3) x = torch.rand(*SIZE) y = torch.rand(*SIZE) output = x + y print(output) print() print(torch.add(x, y))
tensor([[0.4370, 1.4905, 0.8806], [1.7555, 0.9883, 0.8121], [1.1988, 0.6291, 1.2755], [1.2424, 1.1548, 1.1025], [0.8627, 0.9954, 1.1028]]) tensor([[0.4370, 1.4905, 0.8806], [1.7555, 0.9883, 0.8121], [1.1988, 0.6291, 1.2755], [1.2424, 1.1548, 1.1025], [0.8627, 0.9954, 1.1028]])
- Pre-Made Tensors
One advantage to using the function is that you can pass in a tensor, rather than having pytorch create the output-tensor for you.
summation = torch.empty(SIZE) torch.add(x, y, out=summation) print(summation)
tensor([[0.4370, 1.4905, 0.8806], [1.7555, 0.9883, 0.8121], [1.1988, 0.6291, 1.2755], [1.2424, 1.1548, 1.1025], [0.8627, 0.9954, 1.1028]])
- In-Place Operations
Tensors also have methods that let you update them instead of creating a new tensor.
x.add_(y) print(x)
tensor([[0.4370, 1.4905, 0.8806], [1.7555, 0.9883, 0.8121], [1.1988, 0.6291, 1.2755], [1.2424, 1.1548, 1.1025], [0.8627, 0.9954, 1.1028]])
- Slicing
The slicing follows what numpy's arrays do. Here's how to get all the rows of the second column.
print(x[:, 1])
tensor([1.4905, 0.9883, 0.6291, 1.1548, 0.9954])
- Reshaping
You can create a new tensor with the same data but a different shape using the view method.
y = x.view(15) z = x.view(-1, 5) print(x.shape) print(y.shape) print(z.shape)
torch.Size([5, 3]) torch.Size([15]) torch.Size([3, 5])
Using
-1
tells pytorch to infer the dimension based on the original and the dimension that you did pass in.
Torch to Numpy
While there are advantages to using torch for operations (it can use the GPU, for instance), there might be times when you want to convert the tensor to a numpy array.
x = torch.zeros(5)
print(x)
y = x.numpy()
print(y)
x.add_(1)
print(x)
print(y)
print(type(y))
tensor([0., 0., 0., 0., 0.]) [0. 0. 0. 0. 0.] tensor([1., 1., 1., 1., 1.]) [1. 1. 1. 1. 1.] <class 'numpy.ndarray'>
Somehow updating the tensor in place updates the numpy array as well, even though it's an ndarray.
Numpy to Torch
You can go the other way as well.
x = numpy.zeros(5)
print(x)
y = torch.from_numpy(x)
print(y)
x += 5
print(y)
[0. 0. 0. 0. 0.] tensor([0., 0., 0., 0., 0.], dtype=torch.float64) tensor([5., 5., 5., 5., 5.], dtype=torch.float64)
So updating the array (in place) updates the tensor.
Cuda
As I mentioned before, an advantage of pytorch tensors is that they can be run on the GPU - unfortunately the computer I'm on is old and CUDA doesn't run on it, but we can make a check to see if it will first using =torch.cuda.is_available()
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
print(device)
x = torch.ones(5)
# pass in the device
y = torch.ones_like(x, device=device)
# or move the tensor to the device (not an inplace operation)
x = x.to(device)
z = x + y
print(z)
Autograd: Automatic Differentiation
The autograd module in pytorch performs automatic differentiation for you. It works using define-by-run, meaning that as you run you forward-pass through the network, it tracks your calls so you don't have to explicitly define anything for backpropagation to work. To enable or disable it you set the requires_grad
attribute of the tensor you want to train.
tense = torch.ones(2, 2, requires_grad=True)
print(tense)
tensor([[1., 1.], [1., 1.]], requires_grad=True)
Now if you do a tensor operation:
tensed = tense + 1
print(tensed)
tensor([[2., 2.], [2., 2.]], grad_fn=<AddBackward0>)
Our new tensor has a gradient function set for it. If you do more operations on tensed
:
tenser = tensed * 5
print(tenser)
tensor([[10., 10.], [10., 10.]], grad_fn=<MulBackward0>)
a = torch.ones(5, requires_grad=False)
b = a * 5
a.requires_grad_(True)
c = a * 6
print(b)
print(c)
tensor([5., 5., 5., 5., 5.]) tensor([6., 6., 6., 6., 6.], grad_fn=<MulBackward0>)
Two things to note, one is that the gradient function is only set while the requires_grad
attribute is true, the other is that this only works on the leafs in the graph - you can set it on a
and b
but not c
- because since I set requires_grad
to True on a
, when I created c
by multiplying a
by 6, c
became part of a
's graph… I think. Anyway, you can't set it on tensors that are part of the backpropagation path.
Backpropagation
You run back-propagation by calling the backward
method on the last tensor in the graph. In our case the last tensor we have (tenser
) doesn't output numbers so we need to create a final tensor that does for back-propagation to work.
output = tenser.mean()
output.backward()
print(tense.grad)
tensor([[1.2500, 1.2500], [1.2500, 1.2500]])
After one pass through the network (and back) our root-node tensor has some gradients.
Context Manager
If you need to temporarily turn the gradient tracking on or off you can use a context manager.
print((tense*2).requires_grad)
with torch.no_grad():
print((tense* 2).requires_grad)
print((tense * 2).requires_grad)
True False True
Note that the root-will still have require_grad
as true, it's the output of operations working with it that don't get the gradient set.
print(tense.requires_grad)
with torch.no_grad():
print(tense.requires_grad)
print(tense.requires_grad)
True True True
Neural Networks
A Typical Model Training Procedure
- Define the neural network
- Iterate over a dataset of inputs
- Process each input through the network
- Compute the loss (how much error there is)
- Update the weights of the network
The most common way to update the weights is to use a simple formula. \[ weight = weight - textit{learning rate} \times gradient \]
Defining the Network
This will be a network with five layers - two convolutional layers followed by three fully-connected layers. For the convolutional layers we're going to use Max-Pooling and for the fully-connected layers we'll use ReLU activation.
- The Layers
You can just create the layers in the constructor, but since I'm trying to re-learn what's going on I'm going to peel it apart a little more.
The first layer is the input layer, so the
inputs
have to match whatever data you are going to get. In our case we are going to look at a black and white image so it has one input-channel. The three required arguments to the Conv2d constructor are:in_channels
out_channels
kernel_size
class LayerOne: inputs = 1 outputs = 6 convolution_size = 5 layer = neural_network.Conv2d(inputs, outputs, convolution_size)
class LayerTwo: inputs = LayerOne.outputs outputs = 16 convolution_size = 5 layer = neural_network.Conv2d(inputs, outputs, convolution_size)
Layer Three is the first Linear layer. Linear layers do a linear transformation on the inputs.
\[ y = x W^T + b \]
Where x is the input, W is the weight matrix and b is a bias constant.
class LayerThree: inputs = (LayerTwo.outputs * LayerOne.convolution_size * LayerTwo.convolution_size) outputs = 120 layer = neural_network.Linear(inputs, outputs)
class LayerFour: inputs = LayerThree.outputs outputs = 84 layer = neural_network.Linear(inputs, outputs)
This is the last layer so the outputs are the outputs for the model as a whole.
class LayerFive: inputs = LayerFour.outputs outputs = 10 layer = neural_network.Linear(inputs, outputs)
For the forward-pass our convolutional layers will have their output pooled using max_pool2d and all the layers (except for the output layers) will use relu as the activation function to keep the model from being linear.
class NeuralNetwork(neural_network.Module): """A five-layer Convolutional Neural Network""" def __init__(self): super().__init__() self.layer_one = LayerOne.layer self.layer_two = LayerTwo.layer self.layer_three = LayerThree.layer self.layer_four = LayerFour.layer self.layer_five = LayerFive.layer return def flattened_features_counts(self, x): sizes = x.size()[1:] features = 1 for size in sizes: features *= size return features def forward(self, x): """One forward pass through the network Args: x: a one-channel image Returns: a ten-output linear layer """ x = functional.max_pool2d(functional.relu(self.layer_one(x)), (2, 2)) x = functional.max_pool2d(functional.relu(self.layer_two(x)), 2) x = x.view(-1, self.flattened_features_counts(x)) x = functional.relu(self.layer_three(x)) x = functional.relu(self.layer_four(x)) return self.layer_five(x)
model = NeuralNetwork() print(model)
NeuralNetwork( (layer_one): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1)) (layer_two): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1)) (layer_three): Linear(in_features=400, out_features=120, bias=True) (layer_four): Linear(in_features=120, out_features=84, bias=True) (layer_five): Linear(in_features=84, out_features=10, bias=True) )
The output shows the parameters for each layer in our model.
A sample output.
INPUT_SIZE = 32 mock_image = torch.randn(1, 1, INPUT_SIZE, INPUT_SIZE) output = model(mock_image) print(output)
tensor([[ 0.1163, 0.0882, 0.0529, 0.0546, -0.0196, -0.1215, -0.1736, 0.0659, 0.0762, -0.0093]], grad_fn=<AddmmBackward>)
This is the output after one forward pass. Unfortunately we didn't want to train it on fake data so we should reset it.
model.zero_grad() output.backward(torch.randn(1, 10))