PyTorch Mastery Series

The fourth part of the series that I started as a comprehensive guide to building and training neural networks with PyTorch.

Computer Vision and Convolutional Neural Networks

Computer vision is a branch of computer science that focuses on allowing computers to recognize and analyze visual data such as photos and movies.

Some computer vision issues, such as binary classification, detection, and segmentation

As shown here, self-driving cars construct a 3D model of their surroundings by combining images from cameras situated at various sites.

Video from eight cameras is utilized to construct a three-dimensional vector space of the environment.

In this article, we will cover the following:

Working with a vision dataset with torch.vision.datasets

The CNN architecture in Pytorch.

End-to-end multiclass picture categorization challenge.

Steps for modelling with CNNs in PyTorch

Using Pytorch to create a CNN model, select a loss and optimizer, train the model, and then evaluate it.

The following depicts how images are encoded. Images have height, width, colour channels (red, green, blue), and batch size (package size for each sending). We'll use Pytorch to create a deep-learning model called CNN. The output will feature three channels: sushi, steak, and pizza.

What is a Convolutional Neural Network (CNN)?

Convolutional neural networks, often known as ConvNets or CNNs, are deep learning network architectures that get their knowledge directly from data. When looking for patterns in photos to identify objects, classes, and categories, CNNs are quite helpful. They can also be very useful in the classification of signal data, time series, and audio.

CNN terminologies and their Pytorch equivalents

The Pytorch vision base domain library is called torchvision.
torchvision.datasets Get computer vision datasets and data loading functions from this source.
You can use computer vision models that have already been trained by torchvision.models to solve your own issues.
Functions for modifying your vision data so that it may be used with Torch are provided by torchvision.transforms.utility.data.The Pytorch torch's base dataset class is called dataset.utility.data.Over a dataset, DataLoader generates a Python iterable.

import torch
from torch as nn

import torchvision
from torchvision import datasets
from torchvision import transforms
from torchvision.transforms import ToTensor

import matplotlib.pyplot as plt

# Below have different versions
print(torch.__version__)
print(torchvision.__version__)

Getting a dataset

We will use FashionMNIST dataset from torchvision.datasets

GitHub - zalandoresearch/fashion-mnist: A MNIST-like fashion product database. Benchmark
A MNIST-like fashion product database. Benchmark :point_down: - GitHub - zalandoresearch/fashion-mnist: A MNIST-like…github.com

from torchvision import datasets
train_data  = datasets.FashionMNIST(
  root ="data", # where to download data
  train = True, # do we want the training dataset
  download = True, # do we want to download
  transform = torchvision.transform.ToTensor(), # how do we want to transform the data
  target_transform = None # how do we want to transform the labels/targets
)

test_data = datasets.FashionMNIST(
  root = "data",
  train = False,
  downloaded = True,
  transform = ToTensor(),
  target_transform = None
)

print(train_data.class_to_idx)
print(train_data.classes)

Let’s Visualize Our Data

import matplotlib.pyplot as plt
image, label = train_data[0]
plt.imshow(image.squeeze())
plt.title(label)

plt.imshow(image.squeeze(), cmap="grey")
plt.title(class_names[label])
plt.axis(False)

Let’s plot more data images to be one 🧘with the image:

torch.manual_seed(42)
fig = plt.figure(figsize=(9,9))
rows, cols = 4, 4
for i in range(1, rows*cols+1):
  random_idx = torch.randint(0, len(train_data), size=[1]).item()
  img, label = train_data[random_idx]
  fig.add_subplot(rows, cols, i)
  lt.imshow(img.squeeze(), cmap="gray")
  plt.title(class_names[labels])
  plt.axis(False)

Now let's get DataLoader ready:

Our data is currently available as Pytorch Datasets.

Our dataset is converted into a Python iterable via DataLoader.

Specifically, we would like to batch (or mini-batch) our data. Because hardware cannot handle all data at once, we take this action. Additionally, it increases the number of times our neural network may update its gradients inside each epoch.

from torch.utils.data import DataLoader

BATCH_SIZE = 32

# Train data ise shuffle because we want our models to not learn order
train_dataloader = DataLoader(dataset=train_data,
                              batch_size=BATCH_SIZE,
                              shuffle=True)

train_dataloader = DataLoader(dataset=test_data,
                              batch_size=BATCH_SIZE,
                              shuffle=False)

Let’s check what we have created:

print(f"DataLoaders:{train_dataloader, test_dataloader}")
print(f"Length of train_dataloader:{len(train_dataloader) batches of {BATCH_SIZE}")
print(f"Length of test_dataloader:{len(test_dataloader) batches of {BATCH_SIZE}}")

What is inside the training dataloader?

train_features_batch, train_labels_batch = next(iter(train_dataloader))
train_features_batch.shape, train_labels_batch.shape

Model 0: Baseline Model

It's excellent practice to begin with a baseline model when developing a set of machine learning modeling trials. The baseline model is a basic model that you will attempt to enhance through other models and tests.

A continuous range of dims is flattened into a tensor via the flatten layer.

flatten_model = nn.Flatten()
x = train_features_batch[0]
output = flatten_model(x)

Let’s create the model:

from torch import nn
class FashionMNISTModelV0(nn.Module):
  def __init__(self,input_shape:int,hidden_units:int,output_shape:int):
    super().__init__()
    self.layer_stack = nn.Sequential(
      nn.Flatten(),
      nn.Linear(in_features=input_shape, out_features=hidden_units),
      nn.Linear(in_features=hidden_units, out_features=output_shape),
)

def forward(self, x):
  return self.layer_stack(x)

torch.manual_seed(42)

model_0 = FashionMNISTModelV0(
  input_shape = 784, # 28 * 28
  hidden_units = 10, # unit count in hidden layer
  output_shpae = len(class_nmaes) # one for every class
).to("cpu")

dummy_x = torch.rand([1, 1, 28, 28])
model_0(dummy_x)

Now let's define the evaluation, optimizer, and loss metrics:

Given that we are utilizing multi-class data, nn will be our loss function.CorssEntropyClass

Our optimizer torch.optim.SGD()

Since we are working on a classification problem, let’s use accuracy as our evaluation metric.

import requests
from pathlib import Path

if not Path("helper_functions.py").is_file():
  request = request.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py")
  with open("helper_functions.py","wb") as f:  
    f.write(request.content)

from helper_functions import accuracy_fn

# Setup loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model_0.parameters(), lr=0.1)

Putting up a function to schedule our trials

The field of machine learning is highly experimental. Two primary variables that you should frequently monitor are the model's performance, which includes metrics like accuracy and loss, and its execution speed.

from timeit import default_timer() as timer
def print_train_time(start:float, end:float, device: torch.device = None):
  total_time = end - start
  print(f"Train time on {device}: {total_time:.3f} seconds")
  return total_time

start_time = timer()
end_time = timer()
print_train_time(start=start_time, end=end_time, device="gpu")

Let's build a training loop and use data batches to train the model.

loop across epochs.
Calculate the training loss per batch, execute training steps, and then loop across training batches.
Calculate the test loss per batch, execute testing procedures, and then loop across testing batches.
Print out the current situation.
Time everything for enjoyment.
Take note: The loading bar visualization library for the console is called tqdm. The built-in Jupyter Notebook contains it.

from tqdm.auto import tqdm

torch.manual_seed(42)
train_time_start_on_cpu = timer()

# We will keep the epoch small for faster training time
epochs = 3

# Creating training and testing loop
for epoch in tqdm(range(epochs)):
  print(f"Epoch: {epoch}\n------")
  train_loss = 0
  for batch, (X, y) in enumerate(train_dataloader):
    model_0.train()
    y_pred = model_0(X) # forward pass
    # calculate loss per batch
    loss = loss_fn(y_pred, y)
    train_loss += loss # accumulate train loss

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if batch % 400 == 0:
      print(f"Looked at {batch * len(X)} / {len(train_dataloader.dataset)} samples.")

 # Divide total train loss by length of train dataloader
 train_loss /= len(train_dataloader) 

 # Testing
 test_loss, test_acc = 0, 0
 model_0.eval()
 with torch.inference_mode():
  for X_test, y_test in test_dataloader:
    test_pred = model_0(X_test)

    test_loss += loss_fn(test_pred, y_test)

    test_acc += accuracy_fn(y_true=y_test, y_pred=test_pred.argmax(dim=1))
 
 test_loss /= len(test_dataloader)
 test_acc /= len(test_dataloader)
 
 print(f"Train loss: {train_loss:.4f} | Test loss: {test_loss:.4f}, Test ac: {test_acc:.4f}")

train_time_end_on_cpu = timer()
total_train_time_model_0 = print_train_time(start=train_time_start_on_cpu,
                                            end=train_time_end_on_cpu,
                                             device=str(next(model_0.parameters())))

Let’s make predictions:

torch.manual_seed(42)
def eval_model(modeL: torch.nn.Module,
               data_loader: torch.utils.data.DataLoader,
               loss_fn: torch.nn.Module,
               accuracy_fn,
               device = device):
"""Retruns a dictionary containing the results of model predicting on data_loader"""
loss, acc = 0, 0
model.eval()
with torch.inference_mode():
  for X, y in tqdm(data_loader):
    X, y = X.to(device), y.to(device)
    y_pred = model(X)
    loss += loss_fn(y_pred, y)
    acc += accuracy_fn(y_true=y, y_pred=y_pred.argmax(dim=1))
  loss /= len(data_loader)
  acc /= len(data_loader)

 return {"model_name": model.__class__.__name__, #only works when model was created with a class
         "model_loss": loss.item(),
         "model_acc": acc}

modeL-0_results = eval_model(model=model_0,
                             data_loader=test_dataloader,
                             loss_fn=loss_fn,
                              accuracy_fn=accuracy_fn)

model_0_results

Let's confirm if CUDA is being used by our GPU.

This terminal command displays the CUDA GPUs that are available:

!nvidia-smi
torch.cuda.is_available() # Return True or False

ModelV1: Building a better model with non-linearity

class FashionMNISTModelV1(nn.Module):
  def __init__(self, input_shape: int, hidden_units: int, output_shape:int):
    super().__init__()
    self.layer_stack = nn.Sequential(
      nn.Flatten(), # Flatten inputs into a single vector
      nn.Linear(in_features=input_shape, out_features=hidden_units),
      nn.ReLU(),
      nn.Linear(in_features=hidden_units, out_features=output_shape),
      nn.ReLU()
    )
  
  def forward(self, x:torch.Tensor):
    return self.layer_stac(x)

torch.manual_seed(42)
model_1 = FashionMNISTModelV1(input_shape=784, hidden_units=10, output_shape=len(class_names)).to(device)
next(model_1.parameters()).device()

Setup loss, optimizer and evaluation metrics

from helper_funcitons import accuracy_fn
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model_1.parameters(), lr=0.1)

Functionalizing training and evaluation/testing loops

Let's write training loop testing_step() and training loop training_step() functions.

def train_step(modeL: torch.nn.Module,
               data_loader: torch.utils.data.DataLoader,
               loss_fn : torch.nn.Module,
               optimizer: torch.optim.Optimizer,
               accuracy_fn,
               device: torch.device = device):
"""Performs a training with model trying to learn on data_loader."""
train_loss, train_acc = 0, 0
model.train()
for batch, (X,y) in enumerate(data_Loader):
  X, y = X.to(device), y.to(device)
  y_pred = model(X)
  
  # Calculate loss and accuracy per batch
  loss = loss_fn(y_pred, y)
  train_loss += loss
  train_acc += accuracy_fn(y_true=y, y_pred=y_pred.argmax(dim=1)) # logits to prediction labels

  optimizer.zero_grad()
  loss.backward()
  optimizer.step()

  if batch % 400 == 0:
      printf(f"Looked at {batch * len(X)}/{len(train_dataloader.dataset)} samples.")

# Divide total train lsos and accuracy by length of train dataloader
train_loss /= len(train_dataloader)
train_acc /= len(data_loader)
printf(f"Train loss: {train_loss: .5f | Train acc: {train_acc: 2.f}%")

def test_step(model: torch.nn.Module,
              data_loader: torch.utils.data.DataLoader,
              loss_fn: torch.nn.Module,
              accuracy_fn,
              device: torch.device = device):
"""Performs a testing loop step on model going over data_loader."""
test_loss, test_acc = 0, 0

model.eval()

with torch.inference_mode():
  for X, y in data_loader:
    X, y = X.to(device), y.to(device)

    test_pred = model(X)
    test_loss += loss_fn(test_pred, y)
    test_acc += accuracy_fn(y_true=y,y_pred=test_pred.argmax(dim=1))
  
  # Adjust metrics and print out
  test_loss /= len(data_loader)
  test_acc /= len(data_loader)
  printf(f"Train loss: {train_loss: .5f | Train acc: {train_acc: 2.f}%")

torch.manual_seed(42)

from timeit import default_timer as timer
train_time_start_on_gpu = timer()

epochs = 3

for epoch in tqdm(range(epochs)):
  printf(f"Epochs: {epoch}\n-------")
  train_step(model=model_1,
             data_loader=train_dataloader,
             loss_fn=loss_fn,
             optimizer=optimizer,
             accuracy_fn=accuracy_fn,
             device=device)

  test_step(model=model_1,
            data_loader=test_dataloader,
            loss_fn=loss_fn,
            accuracy_fn=accuracy_fn,
            device=device)

train_time_end_on_gpu = timer()
total_train_time_model_1 = print_train_time(start=train_time_start,
                                            end=train_time_end_on_gpu,
                                            device=device)

Caution: Depending on your hardware and data, your model may occasionally train more quickly on CPU than GPU.

This could be because the computing gains provided by the GPU are outweighed by the overhead associated with copying data/models to and from the GPU.
The CPU on the gear you're utilizing is more computationally capable than the GPU.

For more on how to make your models compute faster, see here: https://horace.io/brrr_intro.html

Let’s get model_1 results dictionary

model_1_results = eval_model(model = model_1,
                             data_loader = test_dataloader,
                             loss_fn = loss_fn,
                             accuracy_fn = accuracy_fn,
                             device = device)

Let’s build a convolutional neural network (CNN)

ConvNets are another name for CNNs. The capacity of CNNs to identify patterns in visual data is well recognized.

From the below address you can understand CNN better:

CNN Explainer
An interactive visualization system designed to help non-experts learn about Convolutional Neural Networks (CNNs).poloclub.github.io

A 60x60 input window is max-pooled to a 30x30 output window.

Let's create the deep learning layer in Pytorch mentioned above:

class FashionMNISTModelV2(nn.Module):
  "Model architecture that replicates the TinyVGG model from CNN explainer website."
  def __init__(self, input_shape: int, hidden_units: int, output_shape:int):
    super().__init__()
    self.conv_block_1 = nn.Sequential(
      nn.Conv2d(in_channels=input_shape,
                out_channels=hidden_units,
                kernel_size=3,
                stride=1,
                padding=1)
      nn.ReLU(),
      nn.Conv2d(in_channels=input_shape,
                out_channels=hidden_units,
                kernel_size=3,
                stride=1,
                padding=1),
      nn.ReLU(),
      nn.MaxPool2d(kernel_size=2)
)
  self.conv_block_2 = nn.Sequential(
      nn.Conv2d(in_channels=input_shape,
                out_channels=hidden_units,
                kernel_size=3,
                stride=1,
                padding=1)
      nn.ReLU(),
      nn.Conv2d(in_channels=input_shape,
                out_channels=hidden_units,
                kernel_size=3,
                stride=1,
                padding=1),
      nn.ReLU(),
      nn.MaxPool2d(kernel_size=2)
)
      self.classifier = nn.Sequential(
      nn.Flatten(),
      nn.Linear(in_features=hidden_units*0,
                out_features=output_shape)
)

def forward(self, x):
  x = self.conv_block_1(x)
  x = self.conv_block_2(x)
  x = self.classifier(x)
  return x

torch.manual_seed(42)
model_2 = FashionMNISTModelV2(input_shape=1, # input shape is 1 because there are no colors (e.g 3)
                              hidden_units=10,
                              output_shape=len(class_names),
                              kernel_size=(3,3),
                              stride=1,
                              padding=1)

conv_output = conv_layer(test_image.unsqueeze(0))
conv_output.shape

Stepping through MaxPool2d()

max_pool_layer = nn.MaxPool2d(kernel_size=2)
test_image_through_conv = conv_layer(test_image.unsqueeze(dim=0))
test_image_through_conv_and_max_pool = max_pool_layer(test_image_through_conv)

print(test_image.shape)
print(test_image.unsqueeze(0).shape)
print(test_image_through_conv)
print(test_image_through_conv_and_max_pool)

Setup a loss function and optimizer for model_2

from helper_functions import accuracy_fn

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model_2.parameters(), lr=0.2)

Training and testing model_2 using our training and test functions

torch.manuel_seed(42)
torch.cuda.manual_seed(42)

from timeit import default_timer as timer
train_time_start_model_2 = timer()

epochs = 3
for epoch in tqdm(range(epochs)):
  train_step(model=model_2,
             data_loader=train_dataloader,
             loss_fn=loss_fn,
             optimizer=optimizer,
             accuracy_fn=accuracy_fn,
             device=device)
  test_step(model=model_2,
            data_loader=test_dataloader,
            loss_fn=loss_fn,
            accuracy_fn=accuracy_fn,
            device=device)

  train_time_end_model_2 = timer()
  total_train_time_model_2 = print_train_time(start=train_time_start_model_2,
                                              end=train_time_end_model_2,
                                              device=device)

model_2_result= eval_model(
       model=model_2,
       data_loader=test_dataloader,
       loss_fn=loss_fn,
       accuracy_fn=accuracy_fn,
       device=device
)

Compare model results and training time

import pandas as pd
compare_results = pd.DataFrame([model_0_results, model_1,results, model_2_results])
compare_results["training_time"] = [total_train_time_model_0,
                                    total_train_time_model_1,
                                    total_train_time_model_2]
print(compare_results)

Make and evaluate random predictions with the best model

def make_predictions(model: torch.nn.Module, data: list, device: torch.device = device):
  pred_probs = []
  model.to(device)
  model.eval()
  with torch.inference_mode():
    for sample in data:
      sample = torch.unsqueeze(sample, dim=0).to(device)
      pred_logit = model(sample)
      pred_prob = torch.softmax(pred_logit.squeeze(), dim=0)
      pred_probs.append(pred_prob.cpu())

return torch.stack(pred.probs)

import random
random.seed(42)
test_sample=[]
test_labels = []
for sample, label in random.sample(list(test_data), k=9)
  test_samples.append(sample)
  test_samples.append(label)

plt.imshow(test_samples[0].squeeze(), cmap="gray")
plt.title(class_names[test_labels[0]])

# Converting prediction probabilities to labels
pred_probs = make_predictions(model=model_2, data=test_samples)
pred_classes = pred_probs.argmax(dim=1)

# Plot predictions
plt.figure(sigsize=(9,9))
nrows=3
ncols=3

for i, sample in enumerate(test_samples):
  plt.subplot(nrows, ncols, i+1)
  plt.imshow(sample.squeeze(), cmap="gray")
  pred_label = class_names[pred_classes[i]]
  truth_label = class_names[test_labels[i]]
  title_text = f"Pred: {pred_label}  | Truth : {truth_label}"

  if pred_label == truth_label:
    plt.title(title_text, fontsize=10, c="g")
  else:
    plt.title(title_text, fntsize=10, c="r")
  
  plt.axis(False)

Confusion matrix creation for additional prediction assessment

An excellent visual evaluation tool for your classification models is a confusion matrix.

Using the test dataset, make predictions using our trained model.

Make a torchmetrics confusion matrix.

ConfusingMatrixUtilizing mlxtend.plotting.plot_confusion_matrix(), plot the confusion matrix

import tqdm.auto import tqdm

# Make predicitons with trained model
y_preds = []
model_2.eval()
with torch.inference_mode():
  for X, y in tqdm(test_dataloader, desc="Making predictions..."):
    # Send data and targets to target device
    X, y = X.to(device), y.to(device)
    y_logit = model_2(X)
    # Turn predictions from logits to predicition probabilities to prediction labels
    y_preds = torch.softmax(y_logits.squeeze(), dim=0).argmax(dim=1)
    #Put prediction on CPU for evaluation
    y_preds.append(y_pred.cpu())

# Concatenate list of predictions into a tensor
y_pred_tensor = torch.cat(y_preds)


try:
  import mlxtend, torchmetrics
  assert int(mlxtend.__version__.split(".")[1]) >= 19, "mlextend version should be 0.19.0 or higher")
except:
  !pip install -q torchmetrics -U mlxtend
  import torchmetrics, mlxtend

from torchmetrics import ConfusionMatrix
from mlxtend.plotting import plot_confusion_matrix

confmat = ConfusionMatrix(num_classes=len(class_names))
confmat_tensor = confmat(preds=y_pred_tensor, target=test_data.targets)

fig, ax = plot_confusion_matrix(conf_mat=confmat_tensor.numpy(),
class_names=class_names,
figsize=(10,7)
)

Saving and loading our model

from pathlib import Path

MODEL_PATH = Path("models")
MODEL_PATH.mkdir(parents=True, exist_ok=True)

MODEL_NAME = "computer_vision_model.pth"
MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME

print(f"Saving the model to:{MODEl_SAVE_PATH}")
torch.save(obj=model_2.state_dict(), f=MODEL_SAVE_PATH)

image_shape = [1, 28, 28]

torch.manual_seed(42)

loaded_model_2 = FashionMNISTModelV2(input_shape=1,
                                     hidden_unit=10,
                                     output_shape=len(class_names))
loaded_model_2.load_state_dict(torch.load(f=MODEL_SAVE_PATH))
loaded_model_2.to(device)

Evaluating loaded model

torch.manual_seed(42)

loaded_model_2_results = eval_model(
  model=loaded_model_2,
  data_loader=test_dataloader,
  loss_fn=loss_fn,
  accuracy_fn=accuracy_fn
)

Verifying if the model's output is reasonably near to one another

torch.isclose(torch.tensor(model_2_results["model_loss"]),
  torch.tensor(loaded_model_2_results["model_loss"]),
  atol=1e-02)

Resource

[1] freeCodeCamp.org, (6 Ekim 2022), PyTorch for Deep Learning & Machine Learning — Full Course:

[https://www.youtube.com/watch?v=V_xro1bcAuA&t]

cbarkinozer

PyTorch Mastery Series — Part 4

Computer Vision and Convolutional Neural Networks

What is a Convolutional Neural Network (CNN)?

Getting a dataset

Let’s Visualize Our Data

Model 0: Baseline Model

ModelV1: Building a better model with non-linearity

Functionalizing training and evaluation/testing loops

Confusion matrix creation for additional prediction assessment

An excellent visual evaluation tool for your classification models is a confusion matrix.

Using the test dataset, make predictions using our trained model.

Make a torchmetrics confusion matrix.

ConfusingMatrixUtilizing mlxtend.plotting.plot_confusion_matrix(), plot the confusion matrix

Saving and loading our model

Resource

Posted by Cahit Barkın Özer

Post a Comment

0 Comments

Ads Top

Search This Blog

Home Ads

Ads Top

Popular Posts

AI Agentic Design Patterns with AutoGen

AI Agents in LangGraph

Prompt Compression and Query Optimization