PyTorch Mastery Series

https://cbarkinozer.medium.com/pytorch-ustal%C4%B1k-serisi-b%C3%B6l%C3%BCm-3-4b78206b543f

The third part of the series that I started as a comprehensive guide to building and training neural networks with PyTorch.

Neural Network Classification

Classification is the process of putting objects into groups or classes based on their similarities and affinities, and it expresses the unity of attributes that may exist among a diverse range of persons.

For example, identifying email spam is a binary classification challenge. Multiclass classification is the process of categorizing images based on their content. Documents can be classed by adding tags; known as multi-label categorization.

What we will discuss:

Neural Network Classification model architecture.

The input and output shapes of a classification model.

Creating custom data for viewing, fitting, and predicting.

Modelling steps include building a model, specifying a loss function and optimiser, establishing a training loop, and assessing the model.

Saving and loading models.

Using the power of nonlinearity.

Various categorization evaluation procedures.

Hyperparameters for binary and multiclass classification

Let's create classification data and get it ready.

import sklearn
from sklearn.datasets import make_circles

n_samples = 1000

X,y = make_circles(n_samples, noise = 0.03, random_state = 42)

print(len(X)) # 1000
print(len(y)) # 1000

print(X[:5])
print(y[:5])

# Make dataframe from circle data

import pandas as pd
circles = pd.DataFrame({"X1":X[:,0],"X2":X[:,1],"label":y})
circles.head(10)

import matplotlib.pyplot as plt
plt.scatter(x=X[:,0],y=X[:,1],c=y,cmap=plt.cm.RdY1Bu)

Let's convert the data into tensors and generate train and test divides.

import torch
X = torch.from_numpy(X).type(torch.float)
y = torch.from_numpy(y).type(torch.float)

torch.manual_seed(42)

from sklearn.model_selection import train_test_split # splits train and test randomly
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Let's create a model that distinguishes between blue and red dots. We'll set up a GPU, build a model, specify the loss function and optimizer, and develop a training and testing loop.

import torch
from torch import nn

device = "cuda" if torch.cuda.is_available() else "cpu"

Now that we've defined device-agnostic code, let's build a model that

Subclass ‘nn.Module’ (like almost all models).
Create 2 ‘nn.Linear()’ layers that are capable of handling the shapes of our data.
Defines a ‘forward()’ method that outlines the forward pass.
Instantiate an instance of our model class and send it to the target ‘device’.

class CircleModelV0(nn.Module):
  def __init__(self):
    super().__init__()
    self.layer_1 = nn.Linear(in_features=2, out_features=5) # takes 2 features and upscales to 5 features
    self.layer_2 = nn.Linear(in_features=5, out_features=5) # takes 5 features from previous layer and outputs a single feature (same shape as y)
  
  def forward(self, x):
    return self.layer_2(self.layer_1(x)) # x -> layer_1 -> layer_2 -> output

model_0 = CircleModelV0().to(device)

The categorization playground below allows you to see how a neural network works:

https://playground.tensorflow.org/

If you are using simpler ways like the above CircleModelV0, you may also use the following method:

model_0 = nn.Sequential(
  nn.Linear(in_features=2,out_features=5),
  nn.Linear(in_features=5,out_features=1)
).to(device)

or mix both like this:

class CircleModelV0(nn.Module):
  def __init__(self):
    super().__init__()
    self.two_linear_layers = nn.Sequential(
      nn.Linear(in_features=2,out_features=5),
      nn.Linear(in_features=5,out_features=1)
    )
  
def forward(self, x):
    return two_linear_layers(x)

model_0 = CircleModelV0().to(device)

Make predictions:

untrained_preds = model_0(X_test.to(device))

For categorization, you might use binary or categorical cross-entropy. For the loss function, we'll use 'torch.nn.BECWithLogitsLoss()'.

Let us calculate accuracy. We seek accuracy, which is the absolute truth. Accuracy is calculated using this formula:

Accuracy = True Positive / (True Positive + True Negative) * 100

def accuracy(y_true, y_pred):
  correct = torch.eq(y_true, y_pred).sum().item()
  acc = (correct/len(y_pred)) * 100
  return acc

Our model's outputs will be logits. Logits are the model's raw, unnormalized predictions made before adding any activation function. We may turn these logits into prediction probabilities by sending them through an activation function (such as sigmoid for binary classification or softmax for multiclass classification).

Logits -> prediction probabilities -> prediction labels

The prediction probabilities in our model can then be converted to prediction labels by rounding or using the argmax() method.

model_0.evaluation()
with torch.inference_mode():
  y_logits = model_0(X_test.to(device))[:5]

y_pred_probs = torch.sigmoid(y_logits)

We need to apply a range-style rounding on our prediction probability values.

y_pred_probs ≥0.5, y=1 (class 1)

y_pred_probs <0.5, y=0 (class 0)

# Find the predicted labels
y_preds = torch.round(y_pred_probs)

# In full (logits -> pred probs -> pred labels)
y_pred_labels = torch.round(torch.sigmoid(model_0(X_test.to(device))[:5]))

# Check for equality
print(torch.eq(y_preds.squeeze(), y_pred_labels.squeeze()))

# Get rid of extra dimensions
y_preds.squeeze()

Building a training and testing loop

torch.manuel_seed(42)
torch.cuda.manuel_seed(42)

epochs = 100

X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_train = X_test.to(device), y_Test.to(device)

# Training and testing loop
for epoch in range(epochs):
  model_0.train()
  
  # Forward Pass
  y_logits = model_0(X_train).squeeze()
  y_pred = torch.round(torch.sigmoid(y_logits))

  # Calculate Loss/Accuracy
  loss = loss_fn(y_logits, y_train)
  acc = accuracy_fn(y_true=y_train, y_pred=y_pred)

  optimizer.zero_grad() 
  loss.backward()  # backpropagation
  optimizer.step() # gradient descend

  model_0.eval()
  with torch.inference_mode():
    test_logits = model_0(X_test).squeeze()
    test_pred = torch.round(torch.sigmoid(test_logits))
    
    test_loss = loss_fn(test_logits, y_test)
    test_acc = accuracy_fn(y_true= y_test, y_pred=test_pred)

  if epoch % 10 == 0:
    print(f"Epoch: {epoch}, Loss: {loss:.5f}, Accuracy: {acc.2f}%, Test Loss:{test_loss.5f}, Test Accuracy: {test_acc.2f}%")

After running the aforementioned code, we can see that our model isn't learning. To gain a better understanding, let's depict the predictions. Let's import the plot_decision_boundary() method.

import requests
from pathlib import Path

if Path("helper_functions.py").is_file():
  pass
else:
  request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/")
  with open("helper_functions.py","wb") as f:
    f.write(request.content)

from helper_functions import plot_predictions, plot_decision_boundary

plt.figure(figsize=(12,6))
plt.subplot(1,2,1)
plt.train("Train")
plot_decision_boundary(model_0, X_train, y_train)
plt.subplot(1,2,2)
plt.title("Test")
plot_decision_boundary(model_0,X_test,y_test)

How can we increase the model's performance?

Adding more layers increases the model's ability to learn about patterns in the data.
Adding more hidden units (it is normally recommended to use 5 to 10 hidden layers in this level of difficulty).
Be fit for longer.
Adjusting the activation functions.
Change the learning rate (learning does not occur if it is too small, and gradients burst if it is too great).
Data can also be changed via preprocessing (e.g., eliminating NaN values).

Example of improving model performance with hyperparameter adjustment and layer addition.

!Experiment Tracking Tip: Make these modifications gradually so you can determine which changes have the greatest positive impact on the outcome.

Despite these adjustments, we are still unable to build an effective clustering.

We replace the dataset with our prior regression one and discover that our model has no trouble learning; the issue is with the dataset.

The difficulty was that our model predicted using linear lines when we needed nonlinearity.

Building a model with non-linearity

from torch import nn

class CircleModelV2(nn.Module):
  def __init__(self):
    super().__init__()
    self.layer_1 = nn.Linear(in_features=2,out_features=10)
    self.layer_2 = nn.Linear(in_features=10,out_features=10)
    self.layer_3 = nn.Linear(in_features=10,out_features=1)
    self.relu = nn.Relu()
    self.sigmoid = nn.Sigmoid()
  
def forward(self, x):
    return self.layer_3(self.relu(self.layer_2(self.relu(self.layer_1(x)))))

model_2 = CircleModelV2().to(device)

loss_fn = nn.BCEWithLogitsLoss()
optimizer = torch.optim.SGD.(model_3.parameters(), lr=0.1)

torch.manual_seed(42)
torch.cuda.manual_seed(42)

X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)

epochs = 1000

for epoch in range(epochs):
  model_2.train()
  y_logits = model_3(X_train).squeeze()
  y_pred = torch.round(torch.sigmoid(y_logits))
  
  loss = loss_fn(y_logits, y_train)
  acc = accuracy_fn(y_true=y_train, y_pred = y_pred)

  optimizer = zero_grad()
  loss.backward()
  
  optimizer.step() 

  model_2.eval()
  with torch.inference_mode():
    test_logits = model_2(X_Tests).squeeze()
    test_pred = torch.round(torch.sigmoid(test_logits))

    test_loss = loss_fn(test_logits, y_test)
    test_acc = accuracy_fn(y_true = y_test, y_pred = test_pred)

  if epoch % 100 == 0:
    print(f"Epoch: {epoch}, Loss: {loss:.5f}, Accuracy: {acc.2f}%, Test Loss:{test_loss.5f}, Test Accuracy: {test_acc.2f}%")

More parameters improve learning ability, but they reduce training and inference speed and need more computing power.

Let us assess the correctness of our model using a non-linear activation function:

model_2.eval()
with torch.inference_mode():
  y_preds = torch.round(torch.sigmoid(model_2(X_test))).squeeze()

plt.figure(figsize = (12,6))
plt.subplot(1, 2, 1)
plt.title("Train")
plot_decision_boundry(model_3, X_train, y_train)
plt.subplot(1, 2, 1)
plt.title("Test")
plot_decision_boundary(model_3, X_test, y_test)

The output quality is fairly good, notwithstanding a minor inaccuracy in the lower left features.

The output quality is fairly good, with the exception of a minor irregularity in the bottom left corner.

Multiclass Classification

Multiclass classification is the process of classifying more than two classes.

In multiclass classification, some hyperparameters, such as output layer geometry, hidden layer activation, output activation, loss function, and optimizer, differ from those in binary classification.

import torch
import matplotlib.pyplot as plt
import sklearn.datasets import make_blobs # toy dataset
import sklearn.model_selection import train_test_split

NUM_CLASSES = 4
NUM_FEATURES = 2
RANDOM_SEED = 42

X_blob, y_blob = make_blobs(n_samples = 1000,
                            n_features = NUM_FEATURES,
                            centers = NUM_CLASSES,
                            cluster_std = 1.5,
                            random_state= RANDOM_SEED)

X_blob = torch.from_numpy(X_blob).type(torch.float)
y_blob = torch.from_numpy(y_blob).type(torch.LongTensor)

X_blob_train, X_blob_test, y_blob_train, y_blob_test = train_test_split(X_blob, y_blob,test_size=0.2,random_state=RANDOM_SEED)

plt.figure(figsize=(10,7))
plt.scatter(X_blob[:,0], X_blob[:,1], c=y_blob, cmap=plt.cm.RdYlBu)

Building the multiclass classification model

device = "cuda" if torch.cuda.is_available() else "cpu"

class BlobModel(nn.Module):
  def __init__(self, input_features, output_features, hidden_units=8):
    super().__init__()
    self.linear_layer_stack = nn.Sequential(
      nn.Linear(in_features=input_features, out_features=hidden_units),
      nn.ReLU(),
      nn.Linear(in_features=hidden_units, out_features=hidden_units),
      nn.ReLU(),
      nn.Linear(in_features=hidden_units, out_features=output_features)
    )

 def forward(self, x):
    return self.linear_layer_stack(x)

model_3 = BlobModel(input_features=2, output_features=4, hidden_units=8).to(device)

Creating a loss function and optimizer for a multiclass example

loss_fn = nn.CrossEntropyLoss()

optimizer = torch.optim.SGD(params=model_3.parameters(), lr=0.1)

Predicting results

To evaluate, test, and train our model, we must first transform its outputs to prediction probabilities, followed by prediction labels.

model_3.eval()
witch torch.inference_mode():
  y_preds = model_3(X_blob_test.to(device))

y_pred_probs = torch.softmax(y_logits, dim=1)
# Convert our model's prediction probabilities to prediction label
y_preds = torch.argmax(y_pred_probs, dim=1)

Creating a training and testing loop

torch.manual_seed(42)
torch.cuda.manual_seed(42)
epochs=100

X_blob_train, y_blob_train = X_blob_train.to(device), y_blob_train.to(device)
X_blob_test, y_blob_test = X_blob_test.to(device), y_blob_test.to(device)

for epoch in range(epochs):
  model_3.train()
  
  y_logits = model_3(X_blob_train)
  y_pred = torch.softmax(y_logits, dim=1).argmax(dim=1)
  
  loss = loss_fn(y_logits, y_blob_train)
  acc = accuracy_fn(y_true= y_blob_train, y_pred=y_pred)

  optimizer.zero_grade()
  loss.backward()
  optimizer.step()

  model_3.eval()

  with torch.inference_mode():
    test_logits = model_3(X_blob_test)
    test_preds = torch.softmax(test_logits, dim=1).argmax(dim=1)

    test_loss = loss_fn(test_logits, y_blob_test)
    test_acc = accuracy_fn(y_true=y_blob_test, y_pred = test_preds)

  
  if epoch % 100 == 0:
    print(f"Epoch: {epoch}, Loss: {loss:.5f}, Accuracy: {acc.2f}%, Test Loss:{test_loss.5f}, Test Accuracy: {test_acc.2f}%")

Evaluating Predictions

model_3.eval()

  with torch.inference_mode():
    y_logits = model_3(X_blob_test)

y_pred_probs = torch.softmax(y_logits, dim=1)

plt.figure(figsize=(12,6))
plt.subplot(1,2,1)
plt.title("Train")
plot_decision_boundary(model_3, X_blob_train, y_blob_train)
plt.subplot(1,2,1)
plt.title("Test")
plot_decision_boundary(model_3, X_blob_test, y_blob_test)

If we eliminated the ReLU layers from our ANN, it would still work in this case since our elements are linearly separable; however, if our elements were non-linearly separable (such as pizza components), it would fail.

Classification Metrics

To evaluate the model's performance in classification issues, a variety of measures can be used. Some of the most widely used measures are:

1. Accuracy

The accuracy metric measures how many of the model's total predictions are correct. Its formula is tp+tn / (tp+tn+fp+fn).

Code:

import torchmetrics
accuracy = torchmetrics.Accuracy()

2. Precision

The precision metric measures how many of the examples the model predicts as positive are actually positive. Its formula is tp / (tp+fp).

import torchmetrics
precision = torchmetrics.Precision()

3. Recall

The recall metric measures how much of the examples the model correctly predicts are actually positive. Its formula is tp / (tp+fn).

import torchmetrics
recall = torchmetrics.Recall()

4. F1-score

The F1-score metric is the harmonic mean of precision and recall. Its formula is 2 * (precision * recall) / (precision + recall).

import torchmetrics
f1_score = torchmetrics.F1Score()

5. Confusion Matrix

The complexity matrix metric visualizes the relationship between the model's predictions and the actual labels. Can be created using torchmetrics.ConfusionMatrix().

import torchmetrics
confusion_matrix = torchmetrics.ConfusionMatrix()

Let’s calculate the accuracy of our model:

!pip install torchmetrics

from torchmetrics import Accuracy

torchmetric_accuracy = Accuracy().to(device)

torchmetric_accuracy(y_preds, y_blob_test)

Reference

[1] freeCodeCamp.org, (6 Ekim 2022), PyTorch for Deep Learning & Machine Learning — Full Course:

[https://www.youtube.com/watch?v=V_xro1bcAuA&t]

cbarkinozer

PyTorch Mastery Series — Part 3

Neural Network Classification

Building a training and testing loop

How can we increase the model's performance?

!Experiment Tracking Tip: Make these modifications gradually so you can determine which changes have the greatest positive impact on the outcome.

Building a model with non-linearity

Multiclass Classification

Building the multiclass classification model

Creating a loss function and optimizer for a multiclass example

Predicting results

Creating a training and testing loop

Evaluating Predictions

Classification Metrics

Reference

Posted by Cahit Barkın Özer

Post a Comment

0 Comments

Ads Top

Search This Blog

Home Ads

Ads Top

Popular Posts

AI Agentic Design Patterns with AutoGen

AI Agents in LangGraph