Deep_Learning 5 - SaranAkkiraju/Python_and_Deep_Learning_Programming_ICP GitHub Wiki

Objective

Save the model and use the saved model to predict on new text data
Apply GridSearchCV on the source code provided in the class
Apply the code on spam data set available in the source code (text classification on the spam.csvdata set)

Importing Libraries- Keras, Sklearn and numpy.

Data Reading

Reading the CSV as a pandas data frame.
Having only necessary two columns in the data frame.

Data Preprocessing

Replacing the special characters and rt in the text data with empty space
Maximum features taken is 2000.
Converting the text data into lower case
Used the tokenizer API and performed fit to text and text to sequence methods on the text data.
No of neurons and embedded dimensions are 128 and 196

Model creation

Initialized the sequential layer
Added the embedded layer, dropout is 0.2 and output layer is softmax
Loss used is categorical_crossentropy, the optimizer is Adam and Metics is accuracy.
Converting the categorical Y data to the numerical format and split it into test and train data
Initialized the tensor board and finding the accuracy, loss, and tensorboard graphs.

Model execution

Batch size used is 32 and called the model Fit on the train data, no of epochs used is 7 and callback is tensorboard performed evaluate to calculate the accuracy and loss values

Grid Search CV

Used the grid search CV to find out the best hyperparameters to train the model.
Batch sizes are taken are 32 and 64. No of epochs used is 1 and 2.
Initialized the Grid Search Model with the above parameters while fitting the model on the train data.
Best parameters chosen are batch_size 64, epochs 2
So found the accuracy for the tuned parameters

Save and Load Model

Using the pickle, saved the trained model as .pkl file.
Loaded the .pkl file and performed evaluating on it

Predicting the sentiment value on the new test string.

Taking the string and converting into pandas column. Performing the preprocessing which is converting the text data into lower case and removing the special characters. Converting the preprocessed text into the numerical format, which is used the tokenizer API and performed fit to text and text to sequence methods on the text data we see that the predicted value is negative

Bonus points : Tensorboard loss and accuracy

Performing on SPAM data set

Data Reading

C10

Reading the CSV as a pandas data frame.
Having only necessary two columns in the data frame.

Data Preprocessing

C11

Converting the text data into lower case
Replacing the special characters in the text data with empty space
Maximum features taken is 2000.
Used the tokenizer API and performed fit to text and text to sequence methods on the text data.
No of neurons and embedded dimensions are 128 and 196

Model Creation and execution

C12

Initialized the sequential layer
Added the embedded layer, dropout is 0.2 and output layer is softmax
Loss used is categorical_crossentropy, the optimizer is Adam and Metics is accuracy.
Converting the categorical Y data to the numerical format and split it into test and train data.
Batch size used is 32 and called the model
Fit on the train data, no of epochs used is 2

Model Evalution

C13

performed evaluate to calculate the accuracy and loss values