# Tensorflow - ofithcheallaigh/masters_project GitHub Wiki

# Introduction

This section of the documentation will provide an overview of the `TensorFlow`

system, which was developed by the Google Brain team, initially for internal use within Google.

`Tensorflow`

has a flexible architecture which has been the most popular open source Python machine learning library. It was released in 2015 under the Apache licence. Over the years since its release, it has evolved into a full ecosystem of tools which are used for model development and deployment. Over that time too, a number of APIs have been developed with the specific aim of handling tasks such as data ingestion, transformation, feature engineering and model development.

One such API is the `Keras`

API. We will get to this later.

As the name suggests, `Tensorflow`

is built on tensors. `TensorFlow`

is imported into Python as follows: `import tensorflow as tf`

. Please note, the `as tf`

is a convention, and is not required.

A tensor is a multidimensional arrays with a uniform type. The list of data types available can be found at [1]. To keep things relatively simple, a tensor can be thought of as something similar to a `numpy.array`

. There are a number of basic types of tensors:

Rank | Type |
---|---|

0 | Scalar |

1 | Vector |

2 | Matrix |

3 | Cube |

4 | n |

- The
`scalar`

contains a single value, and as such, has no axes. A rank 0 tensor could be constructed as:`rank0Tensor = tf.constant(1)`

. By default, this will be an`int32`

data type. - The vector tensor is a list of values, and as such, has 1-axes. The vector tensor, or rank-1 tensor, can be constraucted as:
`rank1Tensor = tf.constant([5.6, 3.0, 9.0])`

. - The matrix, or rank-2 tensor has 2-axes and can be constructed as:
`rank2Tensor = tf.constant([6.0,2.0],[1.0,5.0],[9.0,1.0](/ofithcheallaigh/masters_project/wiki/6.0,2.0],[1.0,5.0],[9.0,1.0))`

. - A rank-3 tensor, or a cube tensor, can be constructed as:

```
rank3Tensor = tf.construct([
[[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]],
[[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]],])
```

The rank 0, rank 1 and rank 2 tensors can be visualised as:

The cube tensor can be visually represented as [2]:

Oh course, higher rank tensors are not easily displayed, however, we can give some details on how they are constructed, using the rank 4 tensor as an example. One of the more important bits of information to grasp is the axes. If we create a rank 4 tensor as follows:

```
rank4Tensor = tf.zeros([7,3,2,5])
```

The axes will be:

Tensors will typically contain floats, or integers, however it is not uncommon to see tensors containing complex numbers or strings.

# Keras

`Keras`

, again, is an open source deep learning framework. It is also referred to as a high-level API. When the term "high level" is used, it implies that the lower level there is a body of code, functions, classes and so on, which actually executes the computations required to generate a model. This lower-level of code will include `TensorFlow`

, `Theano`

and the Microsoft Cognitive Toolkit (`CNTK`

) [3]. So, essentially, `Keras`

allows an easier method to access the lower level functionality.

## Keras API

The `Keras`

functionality can be imported into Python as follows:

```
import tensorflow
from tensorflow import keras
```

The `Keras`

API allows for a number of Model class, namely:

- The
`Model`

class - The
`Sequential`

class

### The Sequential Class

The `Sequential`

model is a basic model which consists of a sequence of layers, one after the other. Once the `Keras`

API has been imported into Python, the `Sequential`

class can be accessed as follows:

```
from keras import Sequential
```

When using the `Sequential`

model within `Keras`

, there are a number of metrics we can use. A metric is a parameter, or a function, that can be used to evaluate the performance of a model. Metrics can have an impact on the performance of the models, and as such, care needs to be taken in their selection.

#### Activation Functions

The activation function is one of the most important parts a to deep learning model. It is what will determine the output of a model, the model's accuracy as well as how computationally efficient it is [5].

The simplest activation function is a linear activation function. A linear activation function is one where no transform is applied to the input. This type of system is very easy to train, but it cannot learn complex systems. Linear activation functions will commonly be found in the output layer for networks that will predict a quantity, for example, in regression problems [4]. For this reason, non-linear activation functions are preferred, because they allow nodes to learn more complex features in the input data. Two common non-linear activation functions are the `sigmoid`

function and the `hyperbolic tangent (tanh)`

function, and well as the `relu`

activation function. More on these later.

Another important parameter is the activation function. What is the purpose of interaction functions? Activation functions will impact two things:

- Interaction effects
- Non-linear effects

An interaction effect is where one variable, `var1`

, effects a prediction which depends on another variable, `var2`

. As an example, let's look at hypertension. An indicator of hypertension is a person's bodyweight. But someone may have a bodyweight which, alone would make a person think they are likely to have hypertension, but they could be relatively tall, in which case, their body weight would be normal. So in reality, a persons height also has an impact on their risk of hypertension, so we would have to say weight and height impact a risk of hypertension.

Non-linearities can be seen if one plotted their prediction on one axis, and a variable

The `sigmoid`

function is probably one of the best known activation functions, and can be seen in the plot below:

The `sigmoid`

function is a mathematical function which has an 'S'-shaped curve, which is also known as the sigmoid-curve. The sigmoid function is monotonic, meaning that it is either entirely non-increasing or non-decreasing. A monotonic function's first derivative will does not change sign [6]. Anopther characteristic of the `sigmoid`

function is that it is constrained by a pair of horizontal asymptotes as $x \rightarrow \infty$. The equation for the `sigmoid`

function is:

$$s(x) = \frac{1}{1+e^x}$$

The `sigmoid`

function is good to use in models where one needs a probability as the output, since probabilities exist between 0 and 1. There is a version of the `sigmoid`

function called the `softmax`

function, which is an activation function used in multi-class classification systems. The `softmax`

activation function is, again, a mathematical function that transforms a vector of numbers (int, float etc.) into a vector of probabilities. This probability of each value is representative of the relative size of the values in the vector [7]. The `softmax`

activation function is best used in the output layer of a neural network that predicts a multinomial probability. It can be used on hidden layers, but that is less common. The `softmax`

activation will output one value for each of node in the output layer. When carrying out a multi-class analysis, the target or response variable which holds the class labels must be encoded. In other words, if they are not already, they must be converted to an integer representing each class from 0 to N-1, where N is the number of classes. If there is categorical data, this needs to be One Hot Encoded.

The `tanh`

activation function is similar to the `sigmoid`

function, but can offer better performance. The range of the `tanh`

function is from $(-1\ \text{to} \ 1)$. The `tanh`

function (shown below) is a similar shape to the `sigmoid`

function.

The advantage of this type of function is that the inputs will be mapped better. For example, a negative input will be mapped strongly negative, inputs near the zero range will be mapped near the zero, and positive inputs mapped more strong in line with the positive portion of the `tanh`

graph.

Another activation function is the `relu`

function, often stylised to ReLU. This stands for Rectified Linear Unit, which is a linear function. The `relu`

function is defined as:

$$f(x) = \begin{cases} x, & \text{if}\ x>1 \ 1, & \text{otherwise} \end{cases} \ $$

$$f^{'}(x) = \begin{cases} 1, & \text{if}\ x>0 \ 0, & \text{if} x<0 \end{cases}$$

where $x$ is the input to the neuron.
This type of function is also known as a ramp function, and for the electronics engineers out these, it can be thought of as similar to a half-wave rectifier. In very simple terms, we can say the the `relu`

activation function is 0 for all negative inputs, but the output equals the input for all positive inputs. This is shown in the figure below:

The `relu`

function is, it seems, quite a simple function, which raises the question of why it is so popular in DL models, and why is a very linear function good at accounting for non-linearities.

https://www.kaggle.com/code/dansbecker/rectified-linear-units-relu-in-deep-learning/notebook

# Sources

[1] https://www.tensorflow.org/api_docs/python/tf/dtypes

[2] https://www.tensorflow.org/guide/tensor

[3] https://learning.oreilly.com/library/view/tensorflow-2-pocket/9781492089179/ch01.html#keras_api

[4] https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/

[5] https://towardsdatascience.com/why-rectified-linear-unit-relu-in-deep-learning-and-the-best-practice-to-use-it-with-tensorflow-e9880933b7ef

[6] https://mathworld.wolfram.com/MonotonicFunction.html

[7] https://machinelearningmastery.com/softmax-activation-function-with-python/