Neural Networks - ofithcheallaigh/masters_project GitHub Wiki

Introduction

This section aims to detail how neural networks operate within deep learning.

Understanding Neural Networks

Neural networks were proposed by Warren McCulloch and Walter Pitts in 1944. McCullough, a neurophysiologist, and Pitts, a logician, worked at the University of Chicago. And from there, in 1943, both men authored a paper called A Logical Calculus of the Ideas Immanent in Nervous Activity [1]. This paper introduced an idea considered the first step on the path to the perceptron: the McCulloch-Pitts neuron [2]. Using propositional logic, this paper presented a simplified model of how biological neurons might work in an animal's brain to complete complex tasks. This is the first artificial neural network (ANN) architecture.

This early success led to the idea that humans would be interacting with intelligent machines before long, just like The Jetsons [3].

But this was not to be, and artificial intelligence entered the AI winter, which lasted from the late 1970s to the early 1990s. Only some work ended during this period, but, with a few exceptions, there were few advancements.

In simple terms, neural networks are networks of neurons that we can use to make predictions based on a good training data set, or make classifications, again, based on a good data set. For example, a high-level view of a neural network which contains an input layer, an output later and two hidden layers, is shown below.

The final layer, the output layer, is the prediction layer. The arrows show how each layer is connected and also indicate the direction of travel of the data through the model [4].

A perfectly reasonable question to ask at the start of trying to understand neural networks is: what is a neural network trying to do? To try and answer this question, we will scale back our model shown above to the following:

This is a simple logistic regression with one variable ($x$). The activation in the above system will be the sigmoud function, and the system above can be rewritten as the following equation:

$$\text{sigmoid}(w1 \times X + B_{0}) = \text{Predicted Probability}$$

In this system, $X$ is the input for a single feature we give to the model to calculate a prediction. Then, $w1$ is the estimated slope parameter of the logistic regression/ The $B_{0}$ parameter is the bias, which can be thought of as the intercept term for regression. The main idea here is that in a neural network, each neuron in the system has its bias term. As stated above, each neuron will have a sigmoud activation function, which is the function which gives a probability as the output. The $w1 \times X + B_{0}$ equation is applied to the sigmoud function, which generates the probability output.

From this, we can see that weight, biases and the activation function are critical parts of a deep learning network. Complex neural networks can have multiple hidden layers, but the principle described above remains the same. The only difference in a larger model is that the number of weights, biases and activation functions increases.

Let us look at a situation where one input is fed into two neurons with a sigmoid activation function. This is shown in the image below:

The notation $W_{1,1}$ indicates a weight that lives in the connection between input1 and Neuron1, and $W_{1,2}$ denotes a relationship that lives between input1 and Neuron2 [4].

The outputs can be calculated for each layer as follows:

were W is the weight, and I is the input. We can then apply the activation function to each Z, which gives the neuron output, or activation functions, for the current layer.

Following this process, we can repeatedly calculate Z and apply the activation function to it; we can move from one layer to the next.

One key concept of neural networks is backpropagation. This is a process of working an error backwards from one layer back to the next and assigning the correct amount of error to each neuron in the network. This error indicates how changing the weights and biases will affect the cost function.

Each neuron has its own weight and bias in the neural network process. Each neuron feeds into other neurons in the system. Each neuron can be imagined as a small model, so we have models connected to models. This connectedness and the ability to backpropagate gives neural networks their strength.

Building a neural network is an iterative task: the designer will choose the metrics they wish to examine. These could include:

Accuracy: A measure of the number of correctly classified samples from the total examples in the dataset
Precision: A measure of the true positives (i.e. correctly identified positive samples) out of all the samples which a model has identified as positive
Recall: A measure of the number of true positives present in all positive samples
F1 Score: The F1-score sums up the predictive performance of a model using the precision and recall values
AUC: The Area Under the Curve is the measure of the separability, or, to put it another way, it is the measure of how well a model can distinguish between classes.

When evaluating the network, these metrics will always need to be in the designer's mind.

References

[1] W. McCullough and W. Pitts, "A Logical Calculus of the Ideas Immanent in Nervous Activity," Bulletin of mathematical Biophysics, vol. 5, pp. 115-133, 1943.
[2] History of Information, "McCulloch & Pitts Publish the First Mathematical Model of a Neural Network," [Online]. Available: https://www.historyofinformation.com/detail.php?entryid=782
[3] D. Striga, "14 Times the Jetsons Predicted the Future," 26 6 2016. [Online]. Available: https://screenrant.com/times-the-jetsons-predicted-the-future/#14-robot-servants.
[4] T. Yiu, "Understanding Neural Networks," 2 6 2019. [Online]. Available: https://towardsdatascience.com/understanding-neural-networks-19020b758230.
[x] A. L. Chandra, "McCulloch-Pitts Neuron — Mankind’s First Mathematical Model Of A Biological Neuron," 24 7 2018. [Online]. Available: https://towardsdatascience.com/mcculloch-pitts-model-5fdf65ac5dd1.