Wiki Report for ICP 12 - NagaSurendraBethapudi/Python-ICP GitHub Wiki

Video Link : https://drive.google.com/file/d/1LRY7MhK9RDE9SnNWr2z43kVGTvKmIKDH/view?usp=sharing


Question 1 :

Save the model and use the saved model to predict on new text data (ex, “A lot of good things are happening. We are respected again throughout the world, and that's a great thing.@realDonaldTrump”)

Explanation :

  1. Imported the libraries, data and done pre-processing of text data
  2. Printed some keywords of positive and negative tweets

  1. Saved the model as 'twitter_model.h5'
  2. Reloaded the model
model = load_model('twitter_model.h5')
  1. predicted the tweet as positive

  2. Found accuracy of positive tweet prediction and negative tweet prediction


Question 2:

Apply GridSearchCV on the source code provided in the class

Explanation :

Applied grid search and found best parameters

Passed the same parameters for finding accuracy, loss, score.


Question 3 : Apply the code on spam data set available in the source code (text classification on the spam.csvdata set)

Explanation :

  1. Imported the libraries, data and done pre-processing.

  2. Printed some keywords of spam and ham mails

  3. Built the model

embed_dim = 128
lstm_out = 196
def createmodel():  #createmodel 
    model = Sequential() 
    model.add(Embedding(max_fatures, embed_dim,input_length = X.shape[1])) #passing parameters
    model.add(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2)) #adding one more layer with LSTM
    model.add(Dense(2,activation='softmax')) #output layer
    model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
    return model
  1. Found accuracy , loss of the model.

  2. Found mail prediction accuracy (spam accuracy , ham accuracy)


Learnings :

  1. Learned about Recurrent Neural Network and its different kinds, Long short Term Memory (LSTM) and also about different losses.
  2. Regularization techniques and its losses.
  3. Hyper parameter tunings

Challenges:

Predicting the accuracies of positive/negative tweet predictions.