# PyTorch Introduction

## First steps

(Reference: https://github.com/yunjey/pytorch-tutorial)

Importing PyTorch:

In [1]:
import torch 
import torchvision
import numpy as np
import torchvision.transforms as transforms
from sklearn.metrics import f1_score

Creating tensors:

In [2]:
x = torch.tensor(4., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

Build a computational graph:

In [3]:
y = w * x + b

With the ```backward``` method you can compute the gradients of a tensor:

In [4]:
# Compute gradients.
y.backward()

# Print out the gradients.
print(x.grad)
print(w.grad)
print(b.grad)

tensor(2.)
tensor(4.)
tensor(1.)


You can use the ```randn``` function to create a tensor with random numbers:

In [5]:
# Create tensors of shape (10, 3) and (10, 2).
x = torch.randn(10, 3)
y = torch.randn(10, 2)

The PyTorch library has some pre-defined neural network layer.
They are placed in ```torch.nn.*```.
One example is a fully connected layer.
You can create it as follows:

In [6]:
# Build a fully connected layer.
linear = torch.nn.Linear(3, 2)
print ('w: ', linear.weight)
print ('b: ', linear.bias)

w:  Parameter containing:
tensor([[-0.2975, -0.0542,  0.2297],
        [-0.4930, -0.3588,  0.2627]], requires_grad=True)
b:  Parameter containing:
tensor([ 0.2428, -0.1086], requires_grad=True)


If you want to train your model, you can use one of the losses (e.g. MSELoss) and an optimizer (e.g. SGD, Adam, ...).
You will find an example below:

In [7]:
# Build loss function and optimizer.
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(linear.parameters(), lr=0.01)

# Forward pass.
pred = linear(x)

# Compute loss.
loss = criterion(pred, y)
print('loss: ', loss.item())

# 1-step gradient descent.
loss.backward()
optimizer.step()

# Print out the loss after 1-step gradient descent.
pred = linear(x)
loss = criterion(pred, y)
print('loss after 1 step optimization: ', loss.item())

loss:  0.8991695642471313
loss after 1 step optimization:  0.8922672867774963


### Loading data from numpy

Below you will find an example of how to convert a numpy array to a PyTorch tensor and vice versa.

In [8]:
# Create a numpy array.
x = np.array([[1, 2], [3, 4]])

# Convert the numpy array to a torch tensor.
y = torch.from_numpy(x)

# Convert the torch tensor to a numpy array.
z = y.numpy()

### PyTorch Modules

In PyTorch you can define Modules which are basically container of operations.
For Instance ```Linear``` is a module.
However, modules can contain other modules, allowing to nest them in a tree structure.
For instance you create a sequential module (```Sequential```) containing other modules.
```Sequential``` will execute the modules (as the name proposes) sequentially, such that the output of the $n^{th}$ module is the input of the $(n+1)^{th}$ module:

In [9]:
model = torch.nn.Sequential(
    torch.nn.Linear(100, 50),
    torch.nn.Linear(50, 50),
    torch.nn.Linear(50, 100),
)

Alternatively you can define the same module as follows (allowing non-sequential and re-usable modules):

In [10]:
class MyModule(torch.nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.fc1 = torch.nn.Linear(100, 50)
        self.fc2 = torch.nn.Linear(50, 50)
        self.fc3 = torch.nn.Linear(50, 100)

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        return x

model = MyModule()

If you want to execute on GPU:

In [None]:
model.cuda()(torch.randn(1, 100).cuda())

### Save and load the model

In [None]:
# Save and load the entire model.
torch.save(model, 'model.ckpt')
model = torch.load('model.ckpt')

# Save and load only the model parameters (recommended).
torch.save(model.state_dict(), 'params.ckpt')
model.load_state_dict(torch.load('params.ckpt'))

## CIFAR10 in PyTorch

As in tensorflow there is a easy way to use some common datasets:

In [11]:
# Download and construct CIFAR-10 dataset.
train_dataset = torchvision.datasets.CIFAR10(root='/tmp/data/',
                                             train=True, 
                                             transform=transforms.ToTensor(),
                                             download=True)
test_dataset = torchvision.datasets.CIFAR10(root='/tmp/data/',
                                            train=False,
                                            transform=transforms.ToTensor(),
                                            download=True)

Files already downloaded and verified
Files already downloaded and verified


You can access the dataset as usual:

In [12]:
# Fetch one data pair (read data from disk).
image, label = train_dataset[0]
print (image.size())
print (label)

torch.Size([3, 32, 32])
6


PyTorch has a DataLoader class implemented.
This class is handling for you shuffling batch sizes etc:

In [13]:
# Data loader (this provides queues and threads in a very simple way).
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=64, shuffle=False)

When you iterate over the DataLoader, you get (shuffled) batches for training/testing:

In [14]:
# When iteration starts, queue and thread start to load data from files.
data_iter = iter(train_loader)

# Mini-batch images and labels.
images, labels = data_iter.next()

# Actual usage of the data loader is as below.
for images, labels in train_loader:
    # Training code should be written here.
    pass

In the following cell defines a function that trains a model given an optimizer, criterion and a DataLoader:

In [15]:
def train(net, optimizer, criterion, train_loader, epochs=15):
    for epoch in range(epochs):
        epoch_loss = 0
        for inputs, labels in train_loader:
            optimizer.zero_grad()
            loss = criterion(net(inputs), labels)
            loss.backward()
            optimizer.step()

            epoch_loss += loss.item()
        print("epoch %d -> loss: %.3f" % (epoch, epoch_loss))
        do_epoch()

In the next cell we define a function that computes the f1 score of a trained model:

In [16]:
def test(net, test_loader):
    y_pred, y_true = [], []
    for inputs, labels in test_loader:
        y_true = np.concatenate([y_true, labels.numpy()])
        _, pred = torch.max(net(inputs), 1)
        y_pred = np.concatenate([y_pred, pred])
    f1 = f1_score(y_true, y_pred, average='macro')
    print(f1)

Below we define the same baseline network as in the CIFAR10 competition:

In [17]:
class Baseline(torch.nn.Module):
    def __init__(self):
        super(Baseline, self).__init__()
        self.conv1 = torch.nn.Conv2d(3, 16, 5, padding=2)
        self.conv2 = torch.nn.Conv2d(16, 32, 5, padding=2)
        self.conv3 = torch.nn.Conv2d(32, 64, 5, padding=2)
        self.pool = torch.nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.fc1 = torch.nn.Linear(64 * 4 * 4, 512)
        self.out = torch.nn.Linear(512, 10)

    def forward(self, x):
        x = self.pool(torch.nn.functional.relu(self.conv1(x)))
        x = self.pool(torch.nn.functional.relu(self.conv2(x)))
        x = self.pool(torch.nn.functional.relu(self.conv3(x)))
        x = x.view(-1, 64 * 4 * 4)
        x = torch.nn.functional.relu(self.fc1(x))
        x = self.out(x) # as before, softmax is applied together with the CrossEntropyLoss!
        return x


net = Baseline()

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.001)

Train and store the weights:

In [19]:
#net.load_state_dict(torch.load('baseline.ckpt'))
%time train(net, optimizer, criterion, train_loader)
torch.save(net.state_dict(), 'baseline.ckpt')

epoch 0 -> loss: 2.902
epoch 1 -> loss: 2.726
epoch 2 -> loss: 2.125
epoch 3 -> loss: 2.182
epoch 4 -> loss: 1.649
epoch 5 -> loss: 0.768
epoch 6 -> loss: 1.323
epoch 7 -> loss: 1.586
epoch 8 -> loss: 1.562
epoch 9 -> loss: 0.912
epoch 10 -> loss: 0.303
epoch 11 -> loss: 2.038
epoch 12 -> loss: 0.193
epoch 13 -> loss: 0.541
epoch 14 -> loss: 0.454
CPU times: user 1h 30min 17s, sys: 22.7 s, total: 1h 30min 40s
Wall time: 24min 56s


In [20]:
test(net, test_loader)

0.6890016642276732
