Artificial Neural Network - Statistics-and-Machine-Learning-with-R/Statistical-Methods-and-Machine-Learning-in-R GitHub Wiki

Artificial Neural Network

Click for R-Script

Artificial Neural Networks (ANN) is a supervised learning system built of a large number of simple elements, called neurons or perceptrons. Each neuron can make simple decisions and feeds those decisions to other neurons, organized in interconnected layers. Together, the neural network can emulate almost any function, and answer practically any question, given enough training samples and computing power.

                                              

Source : https://blogs.oracle.com/bigdata/difference-ai-machine-learning-deep-learning

Artificial Neural Network Concepts:

Here is a glossary of basic terms you should be familiar with before learning the details of neural networks.

Inputs:

Source data fed into the neural network, with the goal of making a decision or prediction about the data. Inputs to a neural network are typically a set of real values; each value is fed into one of the neurons in the input layer.

Training Set:

A set of inputs for which the correct outputs are known, used to train the neural network.

Outputs:

Neural networks generate their predictions in the form of a set of real values or boolean decisions. Each output value is generated by one of the neurons in the output layer.

Neuron/perceptron:

The basic unit of the neural network. Accepts an input and generates a prediction.

Activation Function:

Each neuron accepts part of the input and passes it through the activation function. Common activation functions are sigmoid, TanH and ReLu. Activation functions help generate output values within an acceptable range, and their non-linear form is crucial for training the network.

Weight Space:

Each neuron is given a numeric weight. The weights, together with the activation function, define each neuron’s output. Neural networks are trained by fine-tuning weights, to discover the optimal set of weights that generates the most accurate prediction.

There are two methods to initialize the weights: initialize them to zero or assign them randomly.

  • Initializing all weights to 0: This makes your model similar to a linear model. All the neurons and every layer perform the same operation, giving the same output and making the deep net useless.

  • Initializing all weights randomly: Here, the weights are assigned randomly by initializing them very close to 0. It gives better accuracy to the model since every neuron performs different computations. This is the most commonly used method.

Forward Pass:

The forward pass takes the inputs, passes them through the network and allows each neuron to react to a fraction of the input. Neurons generate their outputs and pass them on to the next layer, until eventually the network generates an output.

Error Function:

Defines how far the actual output of the current model is from the correct output. When training the model, the objective is to minimize the error function and bring output as close as possible to the correct value.

Backpropagation:

In order to discover the optimal weights for the neurons, we perform a backward pass, moving back from the network’s prediction to the neurons that generated that prediction. This is called backpropagation. Backpropagation tracks the derivatives of the activation functions in each successive neuron, to find weights that brings the loss function to a minimum, which will generate the best prediction. This is a mathematical process called gradient descent.

Hyperparameter:

A hyperparameter is a setting that affects the structure or operation of the neural network. In real deep learning projects, tuning hyperparameters is the primary way to build a network that provides accurate predictions for a certain problem. Common hyperparameters include the number of hidden layers, the activation function, and how many times (epochs) training should be repeated.

What is a Perceptron?

A perceptron is a binary classification algorithm modelled after the functioning of the human brain—it was intended to emulate the neuron. The perceptron, while it has a simple structure, has the ability to learn and solve very complex problems.

                                              

Source: https://missinglink.ai/guides/neural-network-concepts/complete-guide-artificial-neural-networks/_

What is Multilayer Perceptron?

A multilayer perceptron (MLP) is a group of perceptrons, organized in multiple layers, that can accurately answer complex questions. Each perceptron in the first layer (on the left) sends signals to all the perceptrons in the second layer, and so on. An MLP contains an input layer, at least one hidden layer, and an output layer.

                                              

Source: https://missinglink.ai/guides/neural-network-concepts/complete-guide-artificial-neural-networks/_

The Perceptron Learning Process

                                    

Source: https://missinglink.ai/guides/neural-network-concepts/complete-guide-artificial-neural-networks/_

The perceptron learns as follows:

  1. Takes the inputs which are fed into the perceptrons in the input layer, multiplies them by their weights, and computes the sum.
  2. Adds the number one, multiplied by a “bias weight”. This is a technical step that makes it possible to move the output function of each perceptron (the activation function) up, down, left and right on the number graph.
  3. Feeds the sum through the activation function—in a simple perceptron system, the activation function is a step function.
  4. The result of the step function is the output.

What is a Neural Network Activation Function?

An activation function is a mathematical equation that determines the output of each element (perceptron or neuron) in the neural network. It takes in the input from each neuron and transforms it into an output, usually between one and zero or between -1 and one. Classic activation functions used in neural networks include the step function (which has a binary input), sigmoid and tanh. New activation functions, intended to improve computational efficiency, include ReLu and Swish.

                                        

Source: https://missinglink.ai/guides/neural-network-concepts/complete-guide-artificial-neural-networks/_

Role of the Activation Function

In a neural network, inputs, which are typically real values, are fed into the neurons in the network. Each neuron has a weight, and the inputs are multiplied by the weight and fed into the activation function.

                                              

Source: https://missinglink.ai/guides/neural-network-concepts/complete-guide-artificial-neural-networks/_

Each neuron’s output is the input of the neurons in the next layer of the network, and so the inputs cascade through multiple activation functions until eventually, the output layer generates a prediction. Neural networks rely on nonlinear activation functions—the derivative of the activation function helps the network learn through the backpropagation process.

Neural Network Bias

In artificial neural networks, the word bias has two meanings:

  • It can mean a bias neuron, which is part of the structure of the neural network
  • It can mean bias as a statistical concept, which reflects how well the network is able to generate predictions based on the training samples you provide.

The bias neuron In each layer of the neural network, a bias neuron is added, which simply stores a value of 1. The bias neuron makes it possible to move the activation function left, right, up, or down on the number graph. Without a bias neuron, each neuron takes the input and multiplies it by its weight, without adding anything to the activation equation. This means, for example, it is not possible to input a value of zero and generate an output of two. In many cases it’s necessary to move the entire activation function to the left or to the right, upwards or downwards, to generate the required output values; the bias neuron makes this possible.

                           

Source: https://missinglink.ai/guides/neural-network-concepts/complete-guide-artificial-neural-networks/_

Overfitting and Underfitting in Neural Networks

Overfitting happens when the neural network is good at learning its training set, but is not able to generalize its predictions to additional, unseen examples. This is characterized by low bias and high variance. **Underfitting **happens when the neural network is not able to accurately predict for the training set, not to mention for the validation set. This is characterized by high bias and high variance.

Methods to Avoid Overfitting

Here are a few common methods to avoid overfitting in neural networks:

  • Retraining neural networks****—running the same model on the same training set but with different initial weights, and selecting the network with the best performance.
  • Early stopping—training the network, monitoring the error on the validation set after each iteration, and stopping training when the network starts to overfit the data.
  • Regularization—adding a term to the error function equation, intended to decrease the weights and biases, smooth outputs and make the network less likely to overfit.
  • Tuning performance ratio—similar to regularization, but using a parameter that defines by how much the network should be regularized.
  • Dropout—randomly “kill” a certain percentage of neurons in every training iteration. This ensures some information learned is randomly removed, reducing the risk of overfitting.

Methods to Avoid Underfitting

Here are a few common methods to avoid underfitting in a neural network:

  • Adding neuron layers of inputs—adding neuron layers, or increasing the number of inputs and neurons in each layer, can generate more complex predictions and improve the fit of the model
  • Adding more training samples or improving quality—the more training samples you feed into the network, and the better they represent the variance in the real population, the better the network will perform.
  • Decreasing regularization parameter—regularization can be overdone. By using a regularization performance parameter, you can learn the optimal degree of regularization, which can help the model better fit the data.

The Difference Between Model Parameter and Hyperparameter

  • A model parameter is internal to learn your network and is used to make predictions in a production deep learning model. The objective of training is to learn the values of the model parameters.
  • A hyperparameter is an external parameter set by the operator of the neural network. For example, the number of iterations of training, the number of hidden layers, or the activation function. Different values of hyperparameters can have a major impact on the performance of the network.

List of Common Hyperparameters

                                       

Source: https://missinglink.ai/guides/neural-network-concepts/complete-guide-artificial-neural-networks/_

Source (Content & Images): https://missinglink.ai/guides/neural-network-concepts/complete-guide-artificial-neural-networks/

Click for R-Script