Data Analysis: Neural Networks - ofithcheallaigh/masters_project GitHub Wiki

Introduction

This section of the wiki will detail the work carried out investigating neural networks.

A number of neural networks were investigated using Google Colab. In section will focus more on the various network configurations rather than reading in the data and so on.

The investigation of the abilities of neural networks started with the Keras Model class, but this was a short-lived investigation, which quickly changed to use the Sequential class.

Hyperparameters were tuned with a few to find the best configuration which would work well for deployment on a constrained device.

Hyperparameter Tuning

Two models were set up, with slightly different settings. The first has two Dense layers. The first layer used the relu activation function and had 2 neurons. The second Dense layer used the softmax activation function and had 10 neurons. The optimiser is adam and the metrics are set for accuracy.

This can be seen below:

# Model defined and compiled
model_1 = Sequential()

model_1.add(Dense(2, activation='relu',kernel_initializer='he_normal', input_shape=(n_features,)))
model_1.add(Dense(10, activation='softmax'))

model_1.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # Compile

# Print a summary of the model's architecture
# model_1.summary()

The second model uses three Dense layers. The first two use relu as the activation function and have 2 neurons and 20 neurons respectively. The final layer has 10 neurons and uses the softmax output. Again, the optimiser is adam and the metrics are set for accuracy:

# Model defined and compiled
model_2 = Sequential()

model_2.add(Dense(2, activation='relu',kernel_initializer='he_normal', input_shape=(n_features,)))
model_2.add(Dense(20, activation='relu', kernel_initializer='he_normal'))
model_2.add(Dense(10, activation='softmax'))

model_2.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # Compile

# Print a summary of the model's architecture
# model_2.summary()

For the analysis of these models, the accuracy and loss were recorded as the epoch and batch size was varied.

The results of this for the closed door dataset can be seen below:

The results of this for the display stand dataset can be seen below:

The results of this for the large bin dataset can be seen below:

The results of this for the storage box dataset can be seen below:

From this analysis, we can say that an epoch size of around 40 to 50, with a batch size of 128 produced good accuracy scores.

With the analysis complete for the individual datasets, now was time for an analysis of all the datasets brought together. The same process as before was followed - the epoch and the batch size were varied:

As this analysis progressed, it because clear that the system was not performing as well as it did on the individual datasets. Some runs of the data were missed out on purpose to confirm this theory. The analysis shows that the results were not as consistent as in the individual dataset analysis. For example, we can see instances where the difference between the two-layer model's accuracy and loss scores were very different to the three-layer model's accuracy and results scores. An example can be seen below:

The loss scores also reminded high, when compared to previous results.

This promoted an investigation of a different optimiser, specifically the SGD optimiser. This analysis not only involved a change in the optimiser, but also, investigated the change in the learning rate, and a change in the number of neurons in the input layer, changing from 2 to 10.

Investigation of learning rate

The initial model will have the following setting and associated accuracy and loss results shown below:

Initially, the accuracy and loss results seem very good, however, there is a lot of variation in the results when they are plotted. For example, if we take the settings for 250 epochs, with a batch size of 64, we get the following accuracy plots:

The results for the learning rate of 0.01 is shown below:

Again, the accuracy scores are all very good. But what about the stability of the plot for the accuracy results? Below is the plot for 250 epochs, with a batch size of 64, but this time with a learning rate of 0.01:

This analysis indicates two things worth noting: first, the accuracy results are better when the number of neurons in the input layer is increased (in this case, from 2 to 10); second, the stability of that accuracy appears to be better with a learning rate of 0.01, compared to a learning rate of 0.1.

Two neuron input layer with a learning rate of 0.01

An increased number of neurons in the input layer will make the model more complicated. The data which would be used for a navigation system have 2 channels, so, it is worth investigating the performance of the model with two neurons at the input layer, and a learning rate of 0.01.

Since the three Dense layers have proven to generate better results overall, this analysis will focus on that.

The initial results are shown below:

This analysis shows that the accuracy is not as good with 2 neurons at the input layer compared to having 10 neurons at the input layer. As shown below, increasing the number of epochs produces better accuracy:

To complete this full analysis, adding in an extra mid-layer was investigated. The code for this is shown below:

opt_sgd_2 = SGD(learning_rate=0.01) # Changed from 0.01

model_sgd_2 = Sequential()
model_sgd_2.add(Dense(2, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))
model_sgd_2.add(Dense(20, activation='relu', kernel_initializer='he_normal'))
model_sgd_2.add(Dense(20, activation='relu', kernel_initializer='he_normal'))
model_sgd_2.add(Dense(10, activation='softmax'))
# compile the model
model_sgd_2.compile(optimizer=opt_sgd_2, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

history_sgd_2 = model_sgd_2.fit(X_train, y_train, epochs=epoch_num, batch_size=batch_num, validation_data=(X_test, y_test))

Where epoch_num and batch_num are the additional parameters which produced the best results.

Using the knowledge gained from the steps detailed above a few parameters were processed, and the following were identified. These parameters are shown below:

These are the parameters which will go forward to generate the model for deployment. This model produces the following results:

Accuracy:

Loss:

Metrics:

With the models trained, the following code will produce the .h file which will be used on the constrained device:

# Convert the model to the TensorFlow Lite format without quantization
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_keras_model(model_sgd_2)
tflite_model = converter.convert()

# Save the model to disk
open("object_detect.tflite", "wb").write(tflite_model)
  
import os
basic_model_size = os.path.getsize("object_detect.tflite")
print("Model is %d bytes" % basic_model_size)

!echo "const unsigned char model[] = {" > /content/object_detect.h
!cat object_detect.tflite | xxd -i      >> /content/object_detect.h
!echo "};"                              >> /content/object_detect.h

import os
model_h_size = os.path.getsize("object_detect.h")
print(f"Header file, object_model.h, is {model_h_size:,} bytes.")
print("\nOpen the side panel (refresh if needed). Double click object_model.h to download the file.")

This code is taken from the TensorFlow Lite tutorials.