Tensorflow - ofithcheallaigh/masters_project GitHub Wiki
Introduction
This section of the documentation will provide an overview of the TensorFlow
system, which was developed by the Google Brain team, initially for internal use within Google.
Tensorflow
has a flexible architecture which has been the most popular open source Python machine learning library. It was released in 2015 under the Apache licence. Over the years since its release, it has evolved into a full ecosystem of tools which are used for model development and deployment. Over that time too, a number of APIs have been developed with the specific aim of handling tasks such as data ingestion, transformation, feature engineering and model development.
One such API is the Keras
API. We will get to this later.
As the name suggests, Tensorflow
is built on tensors. TensorFlow
is imported into Python as follows: import tensorflow as tf
. Please note, the as tf
is a convention, and is not required.
A tensor is a multidimensional arrays with a uniform type. The list of data types available can be found at [1]. To keep things relatively simple, a tensor can be thought of as something similar to a numpy.array
. There are a number of basic types of tensors:
Rank | Type |
---|---|
0 | Scalar |
1 | Vector |
2 | Matrix |
3 | Cube |
4 | n |
- The
scalar
contains a single value, and as such, has no axes. A rank 0 tensor could be constructed as:rank0Tensor = tf.constant(1)
. By default, this will be anint32
data type. - The vector tensor is a list of values, and as such, has 1-axes. The vector tensor, or rank-1 tensor, can be constraucted as:
rank1Tensor = tf.constant([5.6, 3.0, 9.0])
. - The matrix, or rank-2 tensor has 2-axes and can be constructed as:
rank2Tensor = tf.constant([6.0,2.0],[1.0,5.0],[9.0,1.0](/ofithcheallaigh/masters_project/wiki/6.0,2.0],[1.0,5.0],[9.0,1.0))
. - A rank-3 tensor, or a cube tensor, can be constructed as:
rank3Tensor = tf.construct([
[[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]],
[[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]],])
The rank 0, rank 1 and rank 2 tensors can be visualised as:
The cube tensor can be visually represented as [2]:
Oh course, higher rank tensors are not easily displayed, however, we can give some details on how they are constructed, using the rank 4 tensor as an example. One of the more important bits of information to grasp is the axes. If we create a rank 4 tensor as follows:
rank4Tensor = tf.zeros([7,3,2,5])
The axes will be:
Tensors will typically contain floats, or integers, however it is not uncommon to see tensors containing complex numbers or strings.
Keras
Keras
, again, is an open source deep learning framework. It is also referred to as a high-level API. When the term "high level" is used, it implies that the lower level there is a body of code, functions, classes and so on, which actually executes the computations required to generate a model. This lower-level of code will include TensorFlow
, Theano
and the Microsoft Cognitive Toolkit (CNTK
) [3]. So, essentially, Keras
allows an easier method to access the lower level functionality.
Keras API
The Keras
functionality can be imported into Python as follows:
import tensorflow
from tensorflow import keras
The Keras
API allows for a number of Model class, namely:
- The
Model
class - The
Sequential
class
The Sequential Class
The Sequential
model is a basic model which consists of a sequence of layers, one after the other. Once the Keras
API has been imported into Python, the Sequential
class can be accessed as follows:
from keras import Sequential
When using the Sequential
model within Keras
, there are a number of metrics we can use. A metric is a parameter, or a function, that can be used to evaluate the performance of a model. Metrics can have an impact on the performance of the models, and as such, care needs to be taken in their selection.
Activation Functions
The activation function is one of the most important parts a to deep learning model. It is what will determine the output of a model, the model's accuracy as well as how computationally efficient it is [5].
The simplest activation function is a linear activation function. A linear activation function is one where no transform is applied to the input. This type of system is very easy to train, but it cannot learn complex systems. Linear activation functions will commonly be found in the output layer for networks that will predict a quantity, for example, in regression problems [4]. For this reason, non-linear activation functions are preferred, because they allow nodes to learn more complex features in the input data. Two common non-linear activation functions are the sigmoid
function and the hyperbolic tangent (tanh)
function, and well as the relu
activation function. More on these later.
Another important parameter is the activation function. What is the purpose of interaction functions? Activation functions will impact two things:
- Interaction effects
- Non-linear effects
An interaction effect is where one variable, var1
, effects a prediction which depends on another variable, var2
. As an example, let's look at hypertension. An indicator of hypertension is a person's bodyweight. But someone may have a bodyweight which, alone would make a person think they are likely to have hypertension, but they could be relatively tall, in which case, their body weight would be normal. So in reality, a persons height also has an impact on their risk of hypertension, so we would have to say weight and height impact a risk of hypertension.
Non-linearities can be seen if one plotted their prediction on one axis, and a variable
The sigmoid
function is probably one of the best known activation functions, and can be seen in the plot below:
The sigmoid
function is a mathematical function which has an 'S'-shaped curve, which is also known as the sigmoid-curve. The sigmoid function is monotonic, meaning that it is either entirely non-increasing or non-decreasing. A monotonic function's first derivative will does not change sign [6]. Anopther characteristic of the sigmoid
function is that it is constrained by a pair of horizontal asymptotes as $x \rightarrow \infty$. The equation for the sigmoid
function is:
$$s(x) = \frac{1}{1+e^x}$$
The sigmoid
function is good to use in models where one needs a probability as the output, since probabilities exist between 0 and 1. There is a version of the sigmoid
function called the softmax
function, which is an activation function used in multi-class classification systems. The softmax
activation function is, again, a mathematical function that transforms a vector of numbers (int, float etc.) into a vector of probabilities. This probability of each value is representative of the relative size of the values in the vector [7]. The softmax
activation function is best used in the output layer of a neural network that predicts a multinomial probability. It can be used on hidden layers, but that is less common. The softmax
activation will output one value for each of node in the output layer. When carrying out a multi-class analysis, the target or response variable which holds the class labels must be encoded. In other words, if they are not already, they must be converted to an integer representing each class from 0 to N-1, where N is the number of classes. If there is categorical data, this needs to be One Hot Encoded.
The tanh
activation function is similar to the sigmoid
function, but can offer better performance. The range of the tanh
function is from $(-1\ \text{to} \ 1)$. The tanh
function (shown below) is a similar shape to the sigmoid
function.
The advantage of this type of function is that the inputs will be mapped better. For example, a negative input will be mapped strongly negative, inputs near the zero range will be mapped near the zero, and positive inputs mapped more strong in line with the positive portion of the tanh
graph.
Another activation function is the relu
function, often stylised to ReLU. This stands for Rectified Linear Unit, which is a linear function. The relu
function is defined as:
$$f(x) = \begin{cases} x, & \text{if}\ x>1 \ 1, & \text{otherwise} \end{cases} \ $$
$$f^{'}(x) = \begin{cases} 1, & \text{if}\ x>0 \ 0, & \text{if} x<0 \end{cases}$$
where $x$ is the input to the neuron.
This type of function is also known as a ramp function, and for the electronics engineers out these, it can be thought of as similar to a half-wave rectifier. In very simple terms, we can say the the relu
activation function is 0 for all negative inputs, but the output equals the input for all positive inputs. This is shown in the figure below:
The relu
function is, it seems, quite a simple function, which raises the question of why it is so popular in DL models, and why is a very linear function good at accounting for non-linearities.
https://www.kaggle.com/code/dansbecker/rectified-linear-units-relu-in-deep-learning/notebook
Sources
[1] https://www.tensorflow.org/api_docs/python/tf/dtypes
[2] https://www.tensorflow.org/guide/tensor
[3] https://learning.oreilly.com/library/view/tensorflow-2-pocket/9781492089179/ch01.html#keras_api
[4] https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/
[5] https://towardsdatascience.com/why-rectified-linear-unit-relu-in-deep-learning-and-the-best-practice-to-use-it-with-tensorflow-e9880933b7ef
[6] https://mathworld.wolfram.com/MonotonicFunction.html
[7] https://machinelearningmastery.com/softmax-activation-function-with-python/