Deep_Learning 5 - SaranAkkiraju/Python_and_Deep_Learning_Programming_ICP GitHub Wiki

Objective

  1. Save the model and use the saved model to predict on new text data
  2. Apply GridSearchCV on the source code provided in the class
  3. Apply the code on spam data set available in the source code (text classification on the spam.csvdata set)

Importing Libraries- Keras, Sklearn and numpy.

C1

Data Reading

C2

  • Reading the CSV as a pandas data frame.
  • Having only necessary two columns in the data frame.

Data Preprocessing

C3

  • Replacing the special characters and rt in the text data with empty space
  • Maximum features taken is 2000.
  • Converting the text data into lower case
  • Used the tokenizer API and performed fit to text and text to sequence methods on the text data.
  • No of neurons and embedded dimensions are 128 and 196

Model creation

C4

  • Initialized the sequential layer
  • Added the embedded layer, dropout is 0.2 and output layer is softmax
  • Loss used is categorical_crossentropy, the optimizer is Adam and Metics is accuracy.
  • Converting the categorical Y data to the numerical format and split it into test and train data
  • Initialized the tensor board and finding the accuracy, loss, and tensorboard graphs.

Model execution

C5 Batch size used is 32 and called the model Fit on the train data, no of epochs used is 7 and callback is tensorboard performed evaluate to calculate the accuracy and loss values O1

Grid Search CV

C6

  • Used the grid search CV to find out the best hyperparameters to train the model.
  • Batch sizes are taken are 32 and 64. No of epochs used is 1 and 2.
  • Initialized the Grid Search Model with the above parameters while fitting the model on the train data.
  • Best parameters chosen are batch_size 64, epochs 2
  • So found the accuracy for the tuned parameters

Save and Load Model

C7 C8

  • Using the pickle, saved the trained model as .pkl file.
  • Loaded the .pkl file and performed evaluating on it O2

Predicting the sentiment value on the new test string.

C9 Taking the string and converting into pandas column. Performing the preprocessing which is converting the text data into lower case and removing the special characters. Converting the preprocessed text into the numerical format, which is used the tokenizer API and performed fit to text and text to sequence methods on the text data we see that the predicted value is negative

Bonus points : Tensorboard loss and accuracy

O5

Performing on SPAM data set

Data Reading

C10

  • Reading the CSV as a pandas data frame.
  • Having only necessary two columns in the data frame.

Data Preprocessing

C11

  • Converting the text data into lower case
  • Replacing the special characters in the text data with empty space
  • Maximum features taken is 2000.
  • Used the tokenizer API and performed fit to text and text to sequence methods on the text data.
  • No of neurons and embedded dimensions are 128 and 196

Model Creation and execution

C12

  • Initialized the sequential layer
  • Added the embedded layer, dropout is 0.2 and output layer is softmax
  • Loss used is categorical_crossentropy, the optimizer is Adam and Metics is accuracy.
  • Converting the categorical Y data to the numerical format and split it into test and train data.
  • Batch size used is 32 and called the model
  • Fit on the train data, no of epochs used is 2

Model Evalution

C13

  • performed evaluate to calculate the accuracy and loss values O5