CNN & Transfer Learning

Page content

[Download this notebook](10 - Transfer Learning.ipynb)


In this lesson you’ll learn:

  • what is meant by transfer learning.
  • how to load and customize pre-trained models.
  • how to read images from folders into Python.

Last week you developed a CNN that can recognize numbers. Today, we focus on taking advantage of pre-trained neural networks. First, you will train a model that can distinguish between various breeds of dogs and cats. In the training task, you will use ResNet to detect pneumonia in X-ray images.

When using transfer learning you are using an already trained model for a novel problem for twhich the original model was not trained for.

The pre-trained model is usually a model that has been trained with a lot of general data. This allowes the model to learn enough general information, which can also be relevant to very specific problem.

For example, ResNet was trained with data from ImageNet which does not contain X-ray images. But by combining layers of the ResNet model that have already been trained with new untrained layers, we can leverage the “knowledge” of ResNet.

By using pre-trained models we do not have to invest time and computational resources to train these models ourselfs


The Training today will take much longer than before. If you are running this Notebook on Google Collab you can easily use a GPU to speed up the training. To train on a GPU, you simply have to do the following:

On the top of the website select the following menus.

Runtime > Change runtime type > Hardware Accelerator

Here select GPU

In case you run this notebook locally (on your own machine) you need to intall PyTorch with cuda support and the right version of cuda for your system, in order to be able to use a GPU. This only works if you have a GPU available in your laptop or PC. But this notebook will work even if you do not have access to a GPU it will just take a little longer.


from __future__ import print_function, division
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
from torch import sigmoid
import matplotlib.pyplot as plt
import time
import os
from os.path import exists, isdir
import copy
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import roc_auc_score
from tqdm import tqdm
import sys
if 'google.colab' in sys.modules:
    !pip install rdkit==2022.3.4
    if exists("utils.py") == False:
        !wget https://raw.githubusercontent.com/kochgroup/intro_pharma_ai/main/utils/utils.py
    %run utils.py
else:
    %run ../utils/utils.py
%matplotlib inline
plt.ion()
# Download the data for today, may take a while
if 'google.colab' in sys.modules:
    if isdir("../data")==False:
        !wget https://uni-muenster.sciebo.de/s/TaOR0Lk50rjPHUU/download
        !unzip -q download -d  ../
        !rm download
else:
    import wget
    import zipfile
    if isdir("../data")==False:
        wget.download("https://uni-muenster.sciebo.de/s/TaOR0Lk50rjPHUU/download")
        with zipfile.ZipFile("data.zip","r") as zip_ref:
            zip_ref.extractall("../")

MNIST is a relatively small data set and therefore can be loaded into memory all at once. However, as discussed in the lecture, images are usually larger than the ones in the MNIST dataset. To deal with large image datasets (among other things), PyTorch has its own library torchvision. This library provies important functions that we do not have in “regular” torch.

Now, more than even, it is important to pay close attention to how the data/images are stored on your device. If you navigate to the folder data/images_animals/ you will see two folders. The first folder train contains the training images. The second folder val contains the test data. Within these folders there are subfolders named after the labels of the images contained within each folder. E.g. the folder beagel contains only images of beagels.

If a folder structure exists that mimics the above described one, we can read in the data very easily with torchvision. But before we can read in the data, we have to transforme the images.

First, the images are too large. Most pre-trained models expect an image size of 224 x 224 pixels, since this is the size of the images in the ImageNet dataset. Also, the images still need to be converted to a tensor. In a last step we scale the data. This time we do not use the minmax scaler, but normalize the images. **To do this, we use the mean and standard deviation of the ImageNet images. Because the network was trained with these images.

The function transforms.Compose() works similar to nn.Sequential. All transformations are applied to all images one afterthe other.

data_transforms = transforms.Compose([
        transforms.Resize((224,224)), #reduziert die Größe des Bildes
        transforms.ToTensor(), #konvertiert das Bild zu einem Tensor
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]) #Normalisiert die Bilder 

Now that the transformations are defined, you can create a PyTorch dataset. But this time we will use the special class datasets.ImageFolder. This special dataset class is exactly designed to work with our folder structure. We only need to specify the path to the images and which transformations we want to apply.

train_data = datasets.ImageFolder('../data/images_animals/train',data_transforms)
test_data = datasets.ImageFolder('../data/images_animals/val',data_transforms)
train_data

You can see that we have a total of 5913 images in our training folder. Also listed are the transformations that will be applied.

As a last step we create the DataLoader. We do this also for our test dataset, because we can`t load all images at once “into the network” and therefore also the test set evaluation must be done in batches.

torch.manual_seed(1235)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=16,
                                             shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=16,
                                             shuffle=True)

example_batch = datasets.ImageFolder('../data/example_batch',data_transforms)
example_batch = torch.utils.data.DataLoader(example_batch, batch_size=6,shuffle=False)

You don’t know yet which and how many different class we have. We can get this information from the data set:

class_names = train_data.classes
print(class_names)
print("\nNumber of Classes: ",len(class_names))

# We save a single batch to analyze it better
inputs_example, targets_example = next(iter(example_batch))

In total we have 37 different types of dogs and cats. We can also look at the pictures with a custom function.

out = torchvision.utils.make_grid(inputs_example[:6])
imshow(out, title=["birman", "birman", "persian", "persian", "pug", "sphynx"])

ResNet

You now have the data in the correct format. However, before we can start training, we also need to get our model into the correct format. As mentioned earlier, PyTorch provides several models that have already been trained. These can be easily loaded. When loading a model for the first time, the weights still need to be loaded from the internet, which may take some time.

We also use ResNet18, since all larger networks would be too slow to train on the university servers.

resnet18 = models.resnet18(pretrained=True)
resnet18

resnet18 gives you an overview which PyTorch layers are used in which order. Pay special attention to the last layer named fc. We can select this layer directly with resnet.fc.

resnet18.fc

This layer is an nn.Linear layer that you should remember from the PyTorch introduction notebook. It has 512 features as input and 1000 as output. These 1000 output neurons correspond to the 1000 different classes in the ImageNet dataset.

In order to use ResNet for out task of classifying we need to further prepare the “ResNet” model. Let’s first freeze all layers of ResNet. This means that these layers will not receive weight updates and thus cannot be trained further. We can do this becasue the model has already been trained. The following code iterates through all the layers and sets requires_grad to False. This lets PyTorch know that no gradients need to be calculated for these layers.

for param in resnet18.parameters():
    param.requires_grad = False

The last thing we need to do is to replace the fc layer. Since we don`t have 1000 classes, but only 37 to predict. Hence we need a new nn.Linear layer, which has as input the size 512 and as output the size 37.

torch.manual_seed(1234)
resnet18.fc = nn.Linear(512, 37) #replaced the linear layer

print(resnet18.fc)
list(resnet18.fc.parameters())

You can see that the new fc layer has requires_grad set to True. These weights will be updated during training. So the fc layer is the only layer to be trained in the network.

Training

Now we can start with the training loop. Before we do this, we define the loss function and the optimizer.

loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(resnet18.parameters(), lr=0.0001)

The training loop will look a bit more complex today. This is because the test set needs to be evaluated using minibatches. To still calculate the metrics correctly, we use the variables running_loss and running_corrects to keep track of our predictions and average the performance at the end of the loop. The training process will take quite a long time due to the many calculations, even if only one layer is updated.

You will also see, that we added some new code to the trainings loop.

The following code snippet checks whether a GPU is available. If that is true, all the training will take place on a GPU, otherwise we will train on the CPU.

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In order to train on the GPU we need to move the model and the data to the GPU. This is done with: .to(device)

resnet18.to(device)

Lastly, during training the batches also need to be move to the GPU:

inputs = inputs.to(device)
targets = targets.to(device)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
resnet18.to(device)
torch.manual_seed(3333)
for epoch in range(3):
    #### Training ####
    resnet18.train()
    running_loss = 0
    running_corrects = 0
    for inputs, targets in tqdm(train_loader):
        inputs = inputs.to(device)
        targets = targets.to(device)
        optimizer.zero_grad()
        output=resnet18(inputs)
        _ , preds = torch.max(output, 1)
        loss = loss_function(output,targets)
        running_loss +=loss.item()
        loss.backward()
        optimizer.step()
        running_corrects +=torch.sum(preds == targets.data).cpu() 
    epoch_loss = running_loss/len(train_loader)    
    epoch_acc = running_corrects.double() / len(train_data)  
    print('Trainings Loss: {:.4f} Trainings Acc: {:.4f}'.format(
        epoch_loss, epoch_acc))
    
    #### Evaluation #####
    resnet18.eval()
    running_loss = 0
    running_corrects = 0
    for inputs, targets in tqdm(test_loader):
        inputs = inputs.to(device)
        targets = targets.to(device)
        output=resnet18(inputs)
        _ , preds =torch.max(output, 1)
        loss = loss_function(output,targets)
        running_loss +=loss.item()
        running_corrects +=torch.sum(preds == targets.data).cpu()
    epoch_acc = running_corrects.double() / len(test_data) 
    epoch_loss = running_loss/len(test_loader)    
    print('Test Loss: {:.4f} Test Acc: {:.4f}'.format(
        epoch_loss, epoch_acc))

After three epochs, we already achieve a test accuracy of 0.8. 80% of the images are classified correctly.

To really make sure that the pre-training of the model has made a difference, we train the same model again. This time, however, without loading the pre-trained weights:

pretrained=False

resnet18 = models.resnet18(pretrained=False) #  <- ResNet is loaded without the pre-trained weights

torch.manual_seed(1234)
resnet18.fc = nn.Linear(512, 37)
loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(resnet18.parameters(), lr=0.0001)
torch.manual_seed(3333)
resnet18.to(device)
for epoch in range(3):
    #### Training ####
    resnet18.train()
    running_loss = 0
    running_corrects = 0
    for inputs, targets in tqdm(train_loader):
        inputs = inputs.to(device)
        targets = targets.to(device)
        optimizer.zero_grad()
        output=resnet18(inputs)
        _ , preds = torch.max(output, 1)
        loss = loss_function(output,targets)
        running_loss +=loss.item()
        loss.backward()
        optimizer.step()
        running_corrects +=torch.sum(preds == targets.data).cpu() 
    epoch_loss = running_loss/len(train_loader)    
    epoch_acc = running_corrects.double() / len(train_data)  
    print('Trainings Loss: {:.4f} Trainings Acc: {:.4f}'.format(
        epoch_loss, epoch_acc))
    
    #### Evaluation #####
    resnet18.eval()
    running_loss = 0
    running_corrects = 0
    for inputs, targets in tqdm(test_loader):
        inputs = inputs.to(device)
        targets = targets.to(device)
        output=resnet18(inputs)
        _ , preds =torch.max(output, 1)
        loss = loss_function(output,targets)
        running_loss +=loss.item()
        running_corrects +=torch.sum(preds == targets.data).cpu()
    epoch_acc = running_corrects.double() / len(test_data) 
    epoch_loss = running_loss/len(test_loader)    
    print('Test Loss: {:.4f} Test Acc: {:.4f}'.format(
        epoch_loss, epoch_acc))

After 3 epochs, we are not nearly as accurate as if we had used the “pretrained” model. This is because the pretrained convolutions are doing some sort of feature generation/extraction.

We can see this more clearly by looking at the convolution activations. For this we use the example images from the beginning of this notebook.

out = torchvision.utils.make_grid(inputs_example[:6])
imshow(out, title=["birman", "birman", "persian", "persian", "pug", "sphynx"])

First, we reload the pretrained ResNet model. Again, we remove the fc layer, but this time do not replace it with a new linear layer. This gives us direct access to the output of the convolution layer. We call this model resnet_convolutions.

resnet18 = models.resnet18(pretrained=True)
resnet_convolutions = nn.Sequential(*list(resnet18.children())[:-1])
resnet_convolutions.eval()

Finally, we pass the six images from before through this special network and save the output (feature_encoding). This output will later serve as the input for the linear layer that makes the final prediction.

feature_encodings=resnet_convolutions(inputs_example)[:6,:,0,0]
feature_encodings

These “encodings” are supposed to be a reduced representation of the original image. A kind of fingerprint. If it is true that the pre-trained convolutions find identify features that are relevant for classification, then similar images should also have similar reduced representations/encodings.

For example, the third and fourth images each show a “persian” cat. So the encodings of the images should also be similar. We can use the cosine_similarity to judge how similar two vectors are. The cosine similarity is always between -1 (very dissimilar) and 1 (very similar).

We can calculate the similarity between the third image (persian) and all other images.

cosine_similarity(feature_encodings[2:3].detach(),feature_encodings.detach()).round(3)

The similarity of the third image to the third image is of course 1, because it is the same image. But to the other images the similarity is much lower. The most similar is the fourth image with a cosine similarity of0.891. This image is also a picture of a persian cat. This indicates that this pre-trained model was already able to recognize certain similarities in the images.

But the images could also be similar before the convolutions?

That is correct, but we can check that too. In the following cell we calculate the similarity of the original images before the convolutions.

cosine_similarity(inputs_example.flatten(1)[2:3],inputs_example.flatten(1)[0:6]).round(3)

Here we notice that the second image of a persian cat is in fact the most dissimilar, although both images show the same cat breed. We can conclude that the network can indeed find non-trivial fetaures in images.

Full disclosure: ImageNet, the dataset on whcih ResNet was orginally trained on, includes also various cat and dog breeds, also the breed “persian”. Effects of using pretraining will most likely be less pronouced when classifiyng images completely “new to” ResNet

Finally, we try out how well our network performs when we load the pre-trained model and create our own linear layer. This time, however, we do not freeze the pre-trained convolutional layers, but train them further as well.

resnet18 = models.resnet18(pretrained=True) #PRETRAIN = TRUE
torch.manual_seed(1234)
resnet18.fc = nn.Linear(512, 37) 
loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(resnet18.parameters(), lr=0.0001)
resnet18.to(device)
torch.manual_seed(3333)

for epoch in range(3):
    
    #### Training ####
    resnet18.train()
    running_loss = 0
    running_corrects = 0
    for inputs, targets in tqdm(train_loader):
        inputs = inputs.to(device)
        targets = targets.to(device)
        optimizer.zero_grad()
        output=resnet18(inputs)
        _ , preds = torch.max(output, 1)
        loss = loss_function(output,targets)
        running_loss +=loss.item()
        loss.backward()
        optimizer.step()
        running_corrects +=torch.sum(preds == targets.data)   
    epoch_loss = running_loss/len(train_loader)    
    epoch_acc = running_corrects.double() / len(train_data)  
    print('Trainings Loss: {:.4f} Trainings Acc: {:.4f}'.format(
        epoch_loss, epoch_acc))
    
    #### Evaluation #####
    resnet18.eval()
    running_loss = 0
    running_corrects = 0
    for inputs, targets in tqdm(test_loader):
        inputs = inputs.to(device)
        targets = targets.to(device)
        output=resnet18(inputs)
        _ , preds =torch.max(output, 1)
        loss = loss_function(output,targets)
        running_loss +=loss.item()
        running_corrects +=torch.sum(preds == targets.data)
    epoch_acc = running_corrects.double() / len(test_data) 
    epoch_loss = running_loss/len(test_loader)    
    print('Test Loss: {:.4f} Test Acc: {:.4f}'.format(
        epoch_loss, epoch_acc))

This network leads to the best results. This is due to the weights of the convolutions now being trained further. Thus, the feature generation is also better adapted to our data set.

In practice, different learning rates are often used for the new linear layers and the already trained convolutions. This allows the new linear layer to be trained faster than the convolutions.

Practise Exercise

Please restart the kernel before working on the exercise.

As discussed several times in the lecture, today for the exercise we will use a pre-trained model to detect pneumonia from X-ray images.

To do this, you will need to read in the data correctly, prepare the model, and fill in the for-loop.

from __future__ import print_function, division
from torch.nn.functional import sigmoid
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
from torch import sigmoid
import matplotlib.pyplot as plt
import time
import os
from os.path import exists, isdir
import copy
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import roc_auc_score
from tqdm import tqdm
import sys
if 'google.colab' in sys.modules:
    !pip install rdkit==2022.3.4
    !wget https://raw.githubusercontent.com/kochgroup/intro_pharma_ai/main/utils/utils.py
    %run utils.py
else:
    %run ../utils/utils.py

plt.ion()
if 'google.colab' in sys.modules:
    if isdir("../data")==False:
        !wget https://uni-muenster.sciebo.de/s/TaOR0Lk50rjPHUU/download
        !unzip -q download -d  ../
        !rm download
else:
    import wget
    import zipfile
    if isdir("../data")==False:
        wget.download("https://uni-muenster.sciebo.de/s/TaOR0Lk50rjPHUU/download")
        with zipfile.ZipFile("data.zip","r") as zip_ref:
            zip_ref.extractall("../")

First, navigate to the folder that contains the animal images. There you will find a folder chest_xray. This folder contains subfolders with the respective training and test datasets. First determine which transformations are to be applied to the images.

data_transforms = transforms.Compose([
        transforms.Resize((224,224)), #reduces the size of the image
        transforms.ToTensor(), #converts the image to a tensor
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]) #Normalizes the images

Next, load the appropriate Datasets and DataLoader:

train_data = datasets.ImageFolder('../data/chest_xray/train',data_transforms)
test_data = datasets.ImageFolder('../data/chest_xray/val',data_transforms)
train_data
torch.manual_seed(1235)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=16, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=16, shuffle=True)

Find out how many different classes we have, and remember that this affects the definition of our loss function and network.

class_names = train_data.classes
print(class_names)
print("\nNumber of Classes: ",len(class_names))
inputs_example, targets_example = next(iter(train_loader))

out = torchvision.utils.make_grid(inputs_example[:2])
imshow(out, title=[class_names[x] for x in targets_example[:2]])

First load the pre-trained resnet18.

resnet18 = models.resnet18(____________)

Prevent the resnet layers from being trained even further:

for param in resnet18.parameters():
    __________ = ____________

Replace the correct layer with a new layer.

torch.manual_seed(1234)
______ = ___________

Define loss function and optimizer. Which loss function should we take for this number of classes?

loss_function = __________
optimizer = optim.Adam(_______________, lr=0.001)

Finally, fill in the training loop. We use type_as(output) to get the correct dtype.

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
resnet18.to(device)
torch.manual_seed(3333)
for epoch in range(1):
    
    #### Training ####
    resnet18.train()
    
    # Needed for the Loss and AUC calculation
    running_loss = 0
    pred_ll = []
    targets_ll = []
    
    
    for inputs, targets in tqdm(_______):
        inputs = inputs.to(device)
        targets = targets.to(device)
        optimizer.zero_grad()
        
        #Forward Propagation
        output=resnet18(__________).squeeze()
        loss = loss_function(_______ , _____.type_as(output))
        
        # Saving the Loss and Predictions
        pred_ll.append(sigmoid(output).squeeze().cpu().detach().clone().numpy())
        targets_ll.append(targets.cpu().detach().clone().numpy())
        running_loss +=loss.item()
        
        # Backpropagation
        loss.backward()
        optimizer.step()
         
    epoch_loss = running_loss/len(train_loader)    
    epoch_auc =  roc_auc_score(targets_ll,pred_ll)
    print('Trainings Loss: {:.4f} Trainings AUC: {:.4f}'.format(
        epoch_loss, epoch_auc))
    
    
    #### Evaluation #####
    resnet18.eval()
    
    # Needed for the Loss and AUC calculation
    running_loss = 0
    pred_ll = []
    targets_ll = []
    
    for inputs, targets in tqdm(__________):        
        inputs = inputs.to(device)
        targets = targets.to(device)
        #Forward Propagation
        output=resnet18(_______).squeeze()
        loss = loss_function(______,______.type_as(output))
        
        pred_ll.append(sigmoid(output).cpu().squeeze().detach().clone().numpy())
        targets_ll.append(targets.cpu().detach().clone().numpy())
        running_loss +=loss.item()

    epoch_auc =  roc_auc_score(targets_ll,pred_ll)
    epoch_loss = running_loss/len(test_loader)    
    print('Test Loss: {:.4f} Test Auc: {:.4f}'.format(
        epoch_loss, epoch_auc))