Building a Basic Neural Network from Scratch: A Step-by-Step Guide

Understanding the Fundamentals of Neural Networks through Hands-on Implementation and Visualizations

6 min readAug 27, 2023

Introduction

Neural networks have become the bedrock of artificial intelligence, revolutionizing technology and impacting our daily lives. These powerful algorithms emulate human learning and decision-making processes, making them adept at handling complex tasks and delivering accurate predictions. In this tutorial, we will delve into the mechanics of neural networks, unraveling their inner workings and shedding light on the reasons behind their structure and behavior.

In this tutorial, we will unravel the inner workings of neural networks and uncover the reasons behind their structure and behavior.

Why Learn Neural Networks:

Neural networks have seamlessly integrated into modern technology for a multitude of compelling reasons:

Pattern Recognition: Neural networks excel at identifying patterns in data, enabling them to recognize objects, faces, and sentiments in images and text.
Natural Language Processing: They play a pivotal role in comprehending and generating human language, giving rise to applications like language translation, chatbots, and sentiment analysis.
Recommendation Systems: Neural networks drive personalized recommendations on streaming platforms, e-commerce websites, and social media, enhancing user experiences.
Autonomous Vehicles: They empower self-driving cars to process sensory data and make real-time decisions, thus enhancing safety and navigation.
Medical Diagnostics: Neural networks contribute to medical image analysis, aiding in the early detection of diseases such as cancer through precise image interpretation.
Financial Forecasting: They are utilized for stock market prediction, credit risk assessment, and fraud detection, assisting financial institutions in making informed decisions.

The Impact of Neural Networks:

Neural networks are deeply ingrained in our lives, powering the technology we interact with daily. From virtual assistants like Siri and Google Assistant to personalized content recommendations on streaming platforms, these networks underpin our digital experiences.
Social media platforms utilize them to curate our news feeds, while they drive advances in healthcare and autonomous vehicles. While building models for your own use case may not require delving as deep as this tutorial, understanding what goes on under the hood can be immensely beneficial.

Now, let’s embark on a journey to explore the intricacies of neural networks together.

Unveiling the Mechanics

First things first, we will be working with the 3-layer neural network having 10 nodes in each layer. Refer to the diagram below:-

The neural network consists of:

Input Layer: 10 nodes that receive the input data.
Hidden Layer: 10 nodes that process information from the input.
Output Layer: 10 nodes that produce intermediate results.
Classification Neuron: 1 Neuron connected to the Output Layer to generate the predictions.

Assuming that the data has been cleaned, scaled (to reach the optimal weights faster), and split into test and train sets we would by this point have some inputs in (x,y) shape. We will then need to do these things to train the entire network:-

Initialize Parameters: Initially there would be no weights or biases so we would have to generate them randomly in the shape of your data which will effectively give us a matrix of m x n . m being your hidden layer nodes and n being the input layer nodes. We consider the next or the previous layer to declare the flow of the weights during forward propagation.
Forward Propagation: This involves multiplying input features by corresponding weights, adding a bias term, and applying a non-linear function. Hidden layers use the ReLU activation function, while the sigmoid function is employed for final predictions. This complexity surpasses simple linear regression, enabling the model to capture intricate data patterns. Each layer undergoes this process, culminating in predictions. Below is what a Sigmoid function would look like when visualized. Once we get probabilities from the Sigmoid function we turn them into either 1 or zero keeping 0.5 as the threshold

3. Backpropagation: During Backpropogation, the chain rule is applied to calculate gradients of the loss with respect to parameters. These gradients drive updates in the opposite direction to minimize the loss. Backpropagation computes how changes in weights or biases across layers influence the final prediction. This involves finding derivatives of activation functions to understand the impact of parameter adjustments on the prediction.

We first find the difference between the prediction and the ground truth and then do a dot product of the transposed output of that layer and the difference and this gives us the difference in the weights. This process is being done from the predicted or the y_hatvalue all the way back to the input layer to find out the delta change we are after.
We then go ahead and sum the difference and divide the value by the length of training examples giving us the difference in the biases.

4. Updating Parameters: This is a pretty simple step where we just update the weights and biases. We take the old weights and biases and minus the learning rate times the derivatives that we have found using the backpropagation.

The reason why we want to multiply the derivatives by the learning rate is that we don’t want the optimal weight-finding journey to be chaotic to result in a situation where we are just roaming around the global minima for many iterations. We want to slowly descend to the global minimum point like the one in the picture below:-

Refer to this link to learn more

5. Accuracy and Loss: Typically we also would want to find out the loss and the accuracy after each epoch to be sure if what we are doing is actually working or not i.e. getting the loss to reduce and the accuracy to increase over time as we keep progressing towards the ideal weights and biases. Note:- These weights and biases will be different for each dataset and how the data is distributed.

For a classification problem, we can use a cross-entropy loss function and for a regression problem, we can use a mean square error as the loss function to gauge the progress. For accuracy, you can either code it yourself or just use sklearn library to find out the accuracy. Make sure to print out the loss and accuracy after a few epochs. We would also have to reshape our ground truth so that the shapes match and the accuracy can be calculated.

Now we have understood it all, all we need to do is create a loop that does this over and over again until we reach our goal. You can check out my Jupyter Notebook for the entire code here.

CONCLUSION

In this all-encompassing guide, we’ve delved deep into the core principles of neural networks, uncovering their internal mechanisms and real-world applications. Neural Network’s ability to mimic human-like learning and decision-making processes has endowed them with immense capability, enabling the resolution of intricate problems and the delivery of precise predictions.
Our exploration encompassed the compelling motivations for mastering neural networks, spanning pattern recognition, natural language processing, recommendation systems, autonomous vehicles, medical diagnostics, and financial forecasting. These exemplars underscore the extensive influence neural networks exert on our quotidian existence, firmly establishing their pivotal role in contemporary technology.
Navigating through the tutorial, we offered a systematic walkthrough for constructing a rudimentary neural network from scratch. Anchoring our understanding in a 3-layer configuration, each layer comprised 10 nodes, forming the bedrock for comprehending the operational framework.
Vital stages, such as initializing parameters, executing forward propagation with activation functions, carrying out gradient calculations through backpropagation, and refining parameters iteratively, were presented with precision. The paramount significance of attaining optimal weights and biases emerged as a focal point, spotlighting the equilibrium between convergence rapidity and stability.

If you found this tutorial helpful and insightful, don’t hesitate to show your appreciation by clapping and following for more enlightening content.

HAPPY LEARNING!!!