Deep_Learning 5 - SaranAkkiraju/Python_and_Deep_Learning_Programming_ICP GitHub Wiki
Objective
- Save the model and use the saved model to predict on new text data
- Apply GridSearchCV on the source code provided in the class
- Apply the code on spam data set available in the source code (text classification on the spam.csvdata set)
Importing Libraries- Keras, Sklearn and numpy.
Data Reading
- Reading the CSV as a pandas data frame.
- Having only necessary two columns in the data frame.
Data Preprocessing
- Replacing the special characters and rt in the text data with empty space
- Maximum features taken is 2000.
- Converting the text data into lower case
- Used the tokenizer API and performed fit to text and text to sequence methods on the text data.
- No of neurons and embedded dimensions are 128 and 196
Model creation
- Initialized the sequential layer
- Added the embedded layer, dropout is 0.2 and output layer is softmax
- Loss used is categorical_crossentropy, the optimizer is Adam and Metics is accuracy.
- Converting the categorical Y data to the numerical format and split it into test and train data
- Initialized the tensor board and finding the accuracy, loss, and tensorboard graphs.
Model execution
Batch size used is 32 and called the model Fit on the train data, no of epochs used is 7 and callback is tensorboard performed evaluate to calculate the accuracy and loss values
Grid Search CV
- Used the grid search CV to find out the best hyperparameters to train the model.
- Batch sizes are taken are 32 and 64. No of epochs used is 1 and 2.
- Initialized the Grid Search Model with the above parameters while fitting the model on the train data.
- Best parameters chosen are batch_size 64, epochs 2
- So found the accuracy for the tuned parameters
Save and Load Model
- Using the pickle, saved the trained model as .pkl file.
- Loaded the .pkl file and performed evaluating on it
Predicting the sentiment value on the new test string.
Taking the string and converting into pandas column. Performing the preprocessing which is converting the text data into lower case and removing the special characters. Converting the preprocessed text into the numerical format, which is used the tokenizer API and performed fit to text and text to sequence methods on the text data we see that the predicted value is negative
Bonus points : Tensorboard loss and accuracy
Performing on SPAM data set
Data Reading
- Reading the CSV as a pandas data frame.
- Having only necessary two columns in the data frame.
Data Preprocessing
- Converting the text data into lower case
- Replacing the special characters in the text data with empty space
- Maximum features taken is 2000.
- Used the tokenizer API and performed fit to text and text to sequence methods on the text data.
- No of neurons and embedded dimensions are 128 and 196
Model Creation and execution
- Initialized the sequential layer
- Added the embedded layer, dropout is 0.2 and output layer is softmax
- Loss used is categorical_crossentropy, the optimizer is Adam and Metics is accuracy.
- Converting the categorical Y data to the numerical format and split it into test and train data.
- Batch size used is 32 and called the model
- Fit on the train data, no of epochs used is 2
Model Evalution
- performed evaluate to calculate the accuracy and loss values