Logistic Regression with PyTorch
In this tutorial, we would learn about PyTorch and how to train a Logistic Regression Model Using the PyTorch library
INTRODUCTION
Logistic Regression is a popular machine learning algorithm used for binary classification problems. It is a type of regression analysis where the response variable is categorical in nature, with two possible outcomes, commonly represented as “0” and “1”. The main objective of logistic regression is to predict the probability of an event occurring, given a set of independent variables.
With Logistic Regression, you essentially want to classify the binary classes using a linear line that separates the two classes. There could be cases where separating the classes might not be as easy as drawing a line and that’s where changing the model architecture comes into the picture.
Logistic Regression is used in identifying if the patient is likely to get benign cancer or malignant one, finding if the customer will churn or not, and predicting if it’s going to rain tomorrow or not.
Now there are several ways that you can implement Logistic Regression however for this article's purposes we are going to be using PyTorch which is a machine learning framework that’s developed by Meta.
Click the link below to learn how to implement Logistic Regression using Gradient Descent which is an optimization technique used in Neural Networks to learn the model parameters.
PyTorch is a machine learning library for Python that provides high-level APIs for creating and training deep learning models. PyTorch provides a module called `torch.nn` which provides a lot of pre-defined modules for building deep learning models.
MODULES IN PYTORCH
PyTorch modules are pre-defined building blocks for creating deep learning models. A module in PyTorch represents a single operation or layer in a neural network. The torch.nn module in PyTorch provides a number of predefined modules for building neural networks.
Some of the most commonly used PyTorch modules are:
- nn.Linear: This module represents a fully connected layer in a neural network, with a weight matrix and bias vector.
- nn.Conv2d: Represents a 2D convolutional layer, commonly used in image classification problems.
- nn.MaxPool2d: This module represents a 2D max pooling layer, used to reduce the spatial dimensions of a tensor.
- nn.Dropout: For the dropout layer, used to prevent overfitting in neural networks.
- nn.BatchNorm2d: This module is for the batch normalization layer, used to normalize the activations of a neural network.
- nn.ReLU: Rectified Linear Unit (ReLU) activation function, used to introduce non-linearity into a neural network.
- nn.Sigmoid: Signifies a Sigmoid activation function, used in binary classification problems.
- Autograd module — A technique known as automatic differentiation is used by PyTorch. In order to calculate the gradients, a recorder records the operations that have been executed and then replays them backward. This technique is particularly effective for creating neural networks since it can reduce the amount of time spent on one epoch by computing the parameter differentiation at the forward pass.
Optim
module —torch.optim
is a module that implements various optimization algorithms used for building neural networks. Most of the commonly used methods are already supported, so there is no need to build them from scratch.
These are just a few examples of the many predefined modules available in the torch.nn module in PyTorch. To use these modules in your deep learning model, simply import them from the torch.nn module and add them to your custom model definition.
WHAT MAKES PYTORCH SO POPULAR?
In Machine Learning Communities, there are so many frameworks to use in the day-to-day Data Science Workflows and so many libraries to test out however there are some libraries that always stand out and prove a point as to why they are so good for Machine Learning and Data Science practitioners.
Below are some of the unique points about PyTorch that make it a popular choice among deep learning practitioners and researchers.
- Dynamic Computational Graph: PyTorch uses a dynamic computational graph, which allows for more flexibility and ease of use compared to static computational graphs used in other deep learning frameworks like TensorFlow.
- Easy to Use: PyTorch has a user-friendly API that makes it easy for developers to create and train deep learning models. PyTorch provides the functionality (DataLoader) of loading the data in the form of batches since the Deep Learning Data can get a little too big for some machines and there might be challenges with just loading the data into the memory.
- Native Support for CUDA: PyTorch has built-in support for CUDA, the parallel computing platform and API for Nvidia GPUs, making it easy to run computationally expensive operations on GPUs.
- Fast Model Development: PyTorch provides a high-level API that makes it easy to quickly experiment with different models and model architectures.
- Good Community Support: PyTorch has a growing community of developers and users, with a wealth of resources and tutorials available online.
- Interoperability with Other Tools: PyTorch is designed to work well with other deep learning tools, such as TensorBoard, which makes it easy to visualize model performance and training progress. It supports NumPy as well which is the building block of the Data Science Community these days since most of the libraries are built on top of NumPy.
- Transfer Learning Support: PyTorch has built-in support for transfer learning, which allows you to reuse pre-trained models for different tasks, saving time and resources.
- Easy Model Deployment: PyTorch models can be easily deployed on a variety of platforms, including mobile devices and the web, using tools such as TorchScript and ONNX.
IMPLEMENTATION
We will start by installing PyTorch if it is not already installed. To install PyTorch, run the following command in your jupyter notebook:
!pip install torch
Now that PyTorch is installed, we will start importing the necessary libraries.
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.impute import KNNImputer
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from torchmetrics.classification import BinaryConfusionMatrix
from sklearn.metrics import roc_curve, auc
We will work with the Spaceship Titanic Dataset available on Kaggle.
ABOUT THE SPACESHIP TITANIC
In the year 2912, transmission has been received from four lightyears away and things aren’t looking good.
The Spaceship Titanic was an interstellar passenger liner launched a month ago. With almost 13,000 passengers on board, the vessel set out on its maiden voyage transporting emigrants from our solar system to three newly habitable exoplanets orbiting nearby stars.
While rounding Alpha Centauri en route to its first destination — the torrid 55 Cancri E — the unwary Spaceship Titanic collided with a spacetime anomaly hidden within a dust cloud. Sadly, it met a similar fate as its namesake from 1000 years before. Though the ship stayed intact, almost half of the passengers were transported to an alternate dimension!
To help rescue crews and retrieve the lost passengers, we are required to predict which passengers were transported by the anomaly using records recovered from the spaceship’s damaged computer system.
Let’s read our data.
train_df = pd.read_csv('train.csv')
This is what the data frame looks like:-
Let’s take a look at the count values of the target class which is the Transported column
sns.countplot(train_df.Transported)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.title('Target Class Countplot', fontsize=20)
plt.show()
Let’s also take a look at the age distribution for each class in the Target Variable. We will use the Seaborn library to plot a histogram
colors = plt.rcParams["axes.prop_cycle"]()
a = 1
b = 2
c = 1
fig = plt.figure()
for i in train_df.Transported.unique():
color = next(colors)["color"]
plt.subplot(a,b,c)
sns.histplot(train_df[train_df.Transported==i]['Age'], color=color)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.title(f'Age Spread where Transported = {i}')
c = c+1
plt.suptitle('Age Distribution', fontsize=20, y=1.02)
plt.tight_layout()
plt.show()
We can observe the age for both classes is between the range of 20 to 30. Now let’s find out how many missing values are present in our data. We can do this by using isnull function in Pandas or we can write a little loop laying out more details.
for i in train_df.columns:
if train_df[i].isnull().sum() != 0:
dtype = train_df[i].dtype
percent = round(train_df[i].isnull().sum()/train_df.shape[0],2)*100
print(f'{i} : {percent}% missing, Dtype : {dtype}')
Since there are some impurities in our data, we will write a function to clean the data and return the data frame
def clean(df):
'''
Takes a dataframe, imputes the missing values, encodes the categorical features,
drops na values and returns the dataframe
'''
imputer = KNNImputer(n_neighbors=2)
encoder= LabelEncoder()
df[df.select_dtypes('float64').columns] = imputer.fit_transform(df.select_dtypes('float64'))
df.drop(['PassengerId', 'Cabin', 'Name'], axis=1, inplace=True)
cols = ['CryoSleep', 'VIP', 'Transported']
for i in cols:
df[i] = encoder.fit_transform(df[i])
df = df.dropna(subset=['HomePlanet', 'Destination'])
obj_cols = list(df.select_dtypes('object'))
for i in obj_cols:
df[i] = encoder.fit_transform(df[i])
return df
The data frame is now shaping up and starting to look like this:-
Since our data has different scales for each column, it is best practice to scale the columns so that the algorithm converges faster.
input_cols = cleaned_train.columns[:-1]
scaler = StandardScaler()
scaler.fit(cleaned_train[input_cols])
cleaned_train[input_cols] = scaler.transform(cleaned_train[input_cols])
Next, we will go ahead and separate the data into independent and dependent variables.
y = cleaned_train.Transported.values
X = cleaned_train[input_cols].values
For splitting the data into Training and Validation Set, we will write a nifty little function that returns these sets in the form of tensors
def split(X,y, test_size=0.10):
'''
Takes X and y arrays, splits them into training and test set.
The function then converts all the sets into tensors only to be returned
by the function.
'''
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=1, shuffle=True)
X_train=torch.from_numpy(X_train.astype(np.float32))
X_test=torch.from_numpy(X_test.astype(np.float32))
y_train=torch.from_numpy(y_train.astype(np.float32))
y_test=torch.from_numpy(y_test.astype(np.float32))
return X_train, X_test, y_train, y_test
#Calling the function we just created to split the data
X_train, X_test, y_train, y_test = split(X,y)
#Reshaping the y_train and y_test
y_train=y_train.view(y_train.shape[0],1)
y_test=y_test.view(y_test.shape[0],1)
Let’s create our model now that we have the data ready. We are going to be writing a class that uses 2 neural network layers, a ReLU activation function, and a sigmoid function
class LogisticRegression(torch.nn.Module):
def __init__(self,no_input_features):
super(LogisticRegression,self).__init__()
self.layer1=torch.nn.Linear(no_input_features,20)
self.relu = nn.ReLU()
self.layer2=torch.nn.Linear(20,1)
def forward(self,x):
y_predicted=self.relu(self.layer1(x))
y_predicted=torch.sigmoid(self.layer2(y_predicted))
return y_predicted
PyTorch allows us to extend the existing module which is nn and we are doing exactly that by creating a custom class for ourselves. We are first initiating the constructor which is called every time an object of the class is created and then the layers are getting defined. The forward function of the class defines how the calculation of weights and biases are going to be calculated.
Let’s go ahead and use the class to define our model. Feel free to go ahead and play around with the layers of the network to further improve the architecture and performance of the network.
n_features = 10
model=LogisticRegression(n_features)
#Setting up Loss criteria and the optimizer
criterion=torch.nn.BCELoss()
optimizer=torch.optim.SGD(model.parameters(),lr=0.1, momentum = 0.5, weight_decay=0.1)
def evaluate(preds, target):
'''
Function to evaluate the performance of the model.
Takes model predictions and targets to return the accuracy of the model.
'''
metric = BinaryAccuracy()
accuracy = metric(preds, target)
return accuracy
Notice that we are using Binary Cross Entropy Loss for monitoring the loss and a Stochastic Gradient Descent for the optimizer. We will go ahead and write a training loop where the model will learn the parameters:
Loss = []
Accuracy = []
number_of_epochs=500
for epoch in range(number_of_epochs):
y_prediction=model(X_train)
loss=criterion(y_prediction,y_train)
accuracy = evaluate(y_prediction, y_train)
Loss.append(loss)
Accuracy.append(accuracy)
loss.backward()
optimizer.step()
optimizer.zero_grad()
if (epoch+1)%100 == 0:
print(f'Epoch: {epoch+1}, Loss : {loss.item()}, val_acc: {accuracy.item()}')
So now that the model has learned the parameters, let’s go ahead and visualize the loss and the accuracy over the duration of the epochs with the help of Matplotlib.
plt.plot([i.item() for i in Loss], marker='o', markersize = 6,markerfacecolor = 'red',markeredgecolor = 'red',markevery=100)
plt.title('Logistic Regression Model Loss Plot', fontsize=18)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.show()
plt.plot([i.item() for i in Accuracy], color='orange', marker='o', markersize = 6,markerfacecolor = 'C0',
markeredgecolor = 'C0',markevery=100)
plt.title('Logistic Regression Model Accuracy Plot', fontsize=18)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.xlabel('Epochs')
plt.ylabel('Validation Accuracy')
plt.show()
HURRAY!!! We have successfully trained the Logistic Regression model in PyTorch. Before ending the article, let’s do one more thing and take a look at the various classification metrics by writing a function:-
def plot_classification_metrics(y_pred, y_test):
"""
This function takes y_true and y_pred values to print the Classification metrics
"""
from torchmetrics.classification import BinaryConfusionMatrix
bcm = BinaryConfusionMatrix()
target = torch.tensor(y_test)
y_pred = torch.tensor([1 if i > 0.5 else 0 for i in y_pred])
y_pred = y_pred.reshape(y_pred.shape[0], 1)
matrix = bcm(y_pred, target )
TP = matrix[0][0]
FP = matrix[0][1]
FN = matrix[1][0]
TN = matrix[1][1]
Accuracy = (TP + TN) / (TP + TN + FP + FN)
print('== CLASSIFICATION METRICS ==')
print(f'Accuracy : {np.around(Accuracy,4)}')
Precision = (TP) / (TP + FP)
print(f'Precision : {np.around(Precision,4)}')
Recall = (TP) / (TP + FN)
print(f'Recall : {np.around(Recall,4)}')
F1_Score = 2 * (Precision*Recall/Precision+Recall)
print(f'F1 Score : {np.around(F1_Score,4)}')
Classification_Error = (FP+FN)/ (TP+FP+FN+TN)
print(f'Classification Error : {np.around(Classification_Error,4)}')
sensitivity = TP/(TP+FN)
print(f'Sensitivity / TPR: {np.around(sensitivity,4)}')
specificity = TN/(TP+FN)
print(f'Specificity / TNR : {np.around(specificity,4)}')
#Calling the function
plot_classification_metrics(X_preds, y_train)
By looking at the metrics, we can say the model is approximately 75% good and there are endless possibilities of what can be done going forward I would welcome you to try and explore them if you’d like.
CONCLUSION
- Logistic Regression is a classic Machine Learning Algorithm that classifies the binary classes using a linear line that separates the two classes.
- PyTorch is a machine learning library for Python that provides high-level APIs for creating and training deep learning models. It's easy to use because of Dynamic Computational Graph, Interoperability with Other Tools, Good, Community Support, Fast Model Development, Easy to Use, and many more features.
- Up next, I will be covering more frameworks that will help with your data science workflows and are easy to implement.
If you liked the content of the article, I would appreciate it if you can give the article a clap and follow me for more. Feel free to connect with me here.