DL_ICP 5 - Saiaishwaryapuppala/CSEE5590_python_Icp GitHub Wiki

Python and Deep Learning: Special Topics

Rajeshwari Sai Aishwarya Puppala

Student ID: 16298162

Class ID: 35

Deep Learning-In class programming:5

Objectives

1.Save the model and use the saved model to predict on new text data (ex, “A lot of good things are happening. We are respected again throughout the world, and that's a great thing.@realDonaldTrump”)

2.Apply GridSearchCV on the source code provided in the class

3.Apply the code on spam data set available in the source code (text classification on the spam.csv data set)

Import Data

Import the necessary Packages required
Import the sentiment dataset and load all of the train and test data
Make the text to lower and remove the text which is other than letters and numbers.

Tokenization

Take the text and tokenize the words.
We use fit_on_text which means that it will make a dictionary of words will a number as a value

Model

Create a model with and embedded layer with embed dimension =128, max features=2000 and input shape
Add an LSTM layer with dropout -0.2 and recurrent dropout=0.2
Add a dense layer with activation function "Softmax"(This is the output layer)

Encoding

As the sentiment column is categorical, It has to be encoded so label encoder is used

Encoded Values

The encoded values are 0- negative
1- neutral
2- positive

Accuracy and Loss

The accuracy and loss of the model are 68.1% and 79.3%

Save and Load Model and Prediction on text

Now Save the model into the system, with name final.DLICP5 and load the mode

Encoded

The encoded values are 0- negative
1- neutral
2- positive

With the model loaded predict the text to which class it belongs From the following, we can say that it belongs to the neutral class

GridSearchCv

It has the provision of providing different hyperparameters
It auto-tunes the parameters, takes the best hyperparameters and gives the best result
The best accuracy is 68% with batch size=64, epochs=2

Model2

Load the Spam data set and create the same model which is done above

Accuracy and loss of the model are 98% and 91%