Tensorflow - ofithcheallaigh/masters_project GitHub Wiki

Introduction

This section of the documentation will provide an overview of the TensorFlow system, which was developed by the Google Brain team, initially for internal use within Google.

Tensorflow has a flexible architecture which has been the most popular open source Python machine learning library. It was released in 2015 under the Apache licence. Over the years since its release, it has evolved into a full ecosystem of tools which are used for model development and deployment. Over that time too, a number of APIs have been developed with the specific aim of handling tasks such as data ingestion, transformation, feature engineering and model development.

One such API is the Keras API. We will get to this later.

As the name suggests, Tensorflow is built on tensors. TensorFlow is imported into Python as follows: import tensorflow as tf. Please note, the as tf is a convention, and is not required.

A tensor is a multidimensional arrays with a uniform type. The list of data types available can be found at [1]. To keep things relatively simple, a tensor can be thought of as something similar to a numpy.array. There are a number of basic types of tensors:

Rank Type
0 Scalar
1 Vector
2 Matrix
3 Cube
4 n
  • The scalar contains a single value, and as such, has no axes. A rank 0 tensor could be constructed as: rank0Tensor = tf.constant(1). By default, this will be an int32 data type.
  • The vector tensor is a list of values, and as such, has 1-axes. The vector tensor, or rank-1 tensor, can be constraucted as: rank1Tensor = tf.constant([5.6, 3.0, 9.0]).
  • The matrix, or rank-2 tensor has 2-axes and can be constructed as: rank2Tensor = tf.constant([6.0,2.0],[1.0,5.0],[9.0,1.0](/ofithcheallaigh/masters_project/wiki/6.0,2.0],[1.0,5.0],[9.0,1.0)).
  • A rank-3 tensor, or a cube tensor, can be constructed as:
rank3Tensor = tf.construct([
           [[0, 1, 2, 3, 4],
            [5, 6, 7, 8, 9]],
           [[10, 11, 12, 13, 14],
            [15, 16, 17, 18, 19]],
           [[20, 21, 22, 23, 24],
            [25, 26, 27, 28, 29]],])

The rank 0, rank 1 and rank 2 tensors can be visualised as:

The cube tensor can be visually represented as [2]:

Oh course, higher rank tensors are not easily displayed, however, we can give some details on how they are constructed, using the rank 4 tensor as an example. One of the more important bits of information to grasp is the axes. If we create a rank 4 tensor as follows:

rank4Tensor = tf.zeros([7,3,2,5])

The axes will be:

Tensors will typically contain floats, or integers, however it is not uncommon to see tensors containing complex numbers or strings.

Keras

Keras, again, is an open source deep learning framework. It is also referred to as a high-level API. When the term "high level" is used, it implies that the lower level there is a body of code, functions, classes and so on, which actually executes the computations required to generate a model. This lower-level of code will include TensorFlow, Theano and the Microsoft Cognitive Toolkit (CNTK) [3]. So, essentially, Keras allows an easier method to access the lower level functionality.

Keras API

The Keras functionality can be imported into Python as follows:

import tensorflow
from tensorflow import keras

The Keras API allows for a number of Model class, namely:

  • The Model class
  • The Sequential class

The Sequential Class

The Sequential model is a basic model which consists of a sequence of layers, one after the other. Once the Keras API has been imported into Python, the Sequential class can be accessed as follows:

from keras import Sequential

When using the Sequential model within Keras, there are a number of metrics we can use. A metric is a parameter, or a function, that can be used to evaluate the performance of a model. Metrics can have an impact on the performance of the models, and as such, care needs to be taken in their selection.

Activation Functions

The activation function is one of the most important parts a to deep learning model. It is what will determine the output of a model, the model's accuracy as well as how computationally efficient it is [5].

The simplest activation function is a linear activation function. A linear activation function is one where no transform is applied to the input. This type of system is very easy to train, but it cannot learn complex systems. Linear activation functions will commonly be found in the output layer for networks that will predict a quantity, for example, in regression problems [4]. For this reason, non-linear activation functions are preferred, because they allow nodes to learn more complex features in the input data. Two common non-linear activation functions are the sigmoid function and the hyperbolic tangent (tanh) function, and well as the relu activation function. More on these later.

Another important parameter is the activation function. What is the purpose of interaction functions? Activation functions will impact two things:

  1. Interaction effects
  2. Non-linear effects

An interaction effect is where one variable, var1, effects a prediction which depends on another variable, var2. As an example, let's look at hypertension. An indicator of hypertension is a person's bodyweight. But someone may have a bodyweight which, alone would make a person think they are likely to have hypertension, but they could be relatively tall, in which case, their body weight would be normal. So in reality, a persons height also has an impact on their risk of hypertension, so we would have to say weight and height impact a risk of hypertension.

Non-linearities can be seen if one plotted their prediction on one axis, and a variable

The sigmoid function is probably one of the best known activation functions, and can be seen in the plot below:

The sigmoid function is a mathematical function which has an 'S'-shaped curve, which is also known as the sigmoid-curve. The sigmoid function is monotonic, meaning that it is either entirely non-increasing or non-decreasing. A monotonic function's first derivative will does not change sign [6]. Anopther characteristic of the sigmoid function is that it is constrained by a pair of horizontal asymptotes as $x \rightarrow \infty$. The equation for the sigmoid function is:

$$s(x) = \frac{1}{1+e^x}$$

The sigmoid function is good to use in models where one needs a probability as the output, since probabilities exist between 0 and 1. There is a version of the sigmoid function called the softmax function, which is an activation function used in multi-class classification systems. The softmax activation function is, again, a mathematical function that transforms a vector of numbers (int, float etc.) into a vector of probabilities. This probability of each value is representative of the relative size of the values in the vector [7]. The softmax activation function is best used in the output layer of a neural network that predicts a multinomial probability. It can be used on hidden layers, but that is less common. The softmax activation will output one value for each of node in the output layer. When carrying out a multi-class analysis, the target or response variable which holds the class labels must be encoded. In other words, if they are not already, they must be converted to an integer representing each class from 0 to N-1, where N is the number of classes. If there is categorical data, this needs to be One Hot Encoded.

The tanh activation function is similar to the sigmoid function, but can offer better performance. The range of the tanh function is from $(-1\ \text{to} \ 1)$. The tanh function (shown below) is a similar shape to the sigmoid function.

The advantage of this type of function is that the inputs will be mapped better. For example, a negative input will be mapped strongly negative, inputs near the zero range will be mapped near the zero, and positive inputs mapped more strong in line with the positive portion of the tanh graph.

Another activation function is the relu function, often stylised to ReLU. This stands for Rectified Linear Unit, which is a linear function. The relu function is defined as:

$$f(x) = \begin{cases} x, & \text{if}\ x>1 \ 1, & \text{otherwise} \end{cases} \ $$

$$f^{'}(x) = \begin{cases} 1, & \text{if}\ x>0 \ 0, & \text{if} x<0 \end{cases}$$

where $x$ is the input to the neuron. This type of function is also known as a ramp function, and for the electronics engineers out these, it can be thought of as similar to a half-wave rectifier. In very simple terms, we can say the the relu activation function is 0 for all negative inputs, but the output equals the input for all positive inputs. This is shown in the figure below:

The relu function is, it seems, quite a simple function, which raises the question of why it is so popular in DL models, and why is a very linear function good at accounting for non-linearities.

https://www.kaggle.com/code/dansbecker/rectified-linear-units-relu-in-deep-learning/notebook

Sources

[1] https://www.tensorflow.org/api_docs/python/tf/dtypes
[2] https://www.tensorflow.org/guide/tensor
[3] https://learning.oreilly.com/library/view/tensorflow-2-pocket/9781492089179/ch01.html#keras_api
[4] https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/
[5] https://towardsdatascience.com/why-rectified-linear-unit-relu-in-deep-learning-and-the-best-practice-to-use-it-with-tensorflow-e9880933b7ef
[6] https://mathworld.wolfram.com/MonotonicFunction.html
[7] https://machinelearningmastery.com/softmax-activation-function-with-python/