DL_ICP 5 - Saiaishwaryapuppala/CSEE5590_python_Icp GitHub Wiki
Python and Deep Learning: Special Topics
Rajeshwari Sai Aishwarya Puppala
Student ID: 16298162
Class ID: 35
Deep Learning-In class programming:5
Objectives
1.Save the model and use the saved model to predict on new text data (ex, “A lot of good things are happening. We are respected again throughout the world, and that's a great thing.@realDonaldTrump”)
2.Apply GridSearchCV on the source code provided in the class
3.Apply the code on spam data set available in the source code (text classification on the spam.csv data set)
Import Data
- Import the necessary Packages required
- Import the sentiment dataset and load all of the train and test data
- Make the text to lower and remove the text which is other than letters and numbers.
Tokenization
- Take the text and tokenize the words.
- We use fit_on_text which means that it will make a dictionary of words will a number as a value
Model
- Create a model with and embedded layer with embed dimension =128, max features=2000 and input shape
- Add an LSTM layer with dropout -0.2 and recurrent dropout=0.2
- Add a dense layer with activation function "Softmax"(This is the output layer)
Encoding
As the sentiment column is categorical, It has to be encoded so label encoder is used
Encoded Values
- The encoded values are 0- negative
- 1- neutral
- 2- positive
Accuracy and Loss
The accuracy and loss of the model are 68.1% and 79.3%
Save and Load Model and Prediction on text
Now Save the model into the system, with name final.DLICP5 and load the mode
Encoded
- The encoded values are 0- negative
- 1- neutral
- 2- positive
With the model loaded predict the text to which class it belongs From the following, we can say that it belongs to the neutral class
GridSearchCv
- It has the provision of providing different hyperparameters
- It auto-tunes the parameters, takes the best hyperparameters and gives the best result
- The best accuracy is 68% with batch size=64, epochs=2
Model2
Load the Spam data set and create the same model which is done above
Accuracy and loss of the model are 98% and 91%