Deep_Learning_3 - SaranAkkiraju/Python_and_Deep_Learning_Programming_ICP GitHub Wiki

Objective:

  • In the code provided there are three mistake which stop the code to get run successfully; find those mistakes and explain why they need to be corrected to be able to get the code run
  • Add embedding layer to the model, did you experience any improvement?
  • Apply the code on 20_newsgroup data set we worked in the previous classesfromsklearn.datasets importfetch_20newsgroupsnewsgroups_train =fetch_20newsgroups(subset='train', shuffle=True, categories=categories,)

The 3 mistakes in the code given are:

  • Input dimensions should be vocab_size
  • Output neurons should be 3 for positive, negative, neutral in the target column of dataset.
  • Output layer activation function should be softmax as it works best for the multi class classification

Importing libraries

C1

Reading data

C2 O1

Data Preprocessing

C3

Modelling with adding embedded layers

  • Compressed the dataset to 20000 records.
  • Tokenizing the data and converting into the text to matrix form.
  • Used the label encoder method to convert the text to digits and fit, transformed the data.
  • Split data into train and test which is considering the 25% as the test data.
  • Used the deep learning sequential model with 2 layers.
  • 1st layer used is Embedded layer which finds out the meaning and captures the semantic relationships within the data. It follows the default word2vec algorithm which looks at the bigrams relationship among the data
  • 2nd layer - No of neurons is 300 and activation function is relu.
  • Output layer - No of neurons is 20 and activation function is softmax as the output layer
  • No of the epoch is 5, Batch size is 256 and loss function used is sparse_categorical_crossentropy
  • Accuracy is 62%

C4 O4

Loss

C5

O2

Accuracy

C6 O3

Reading News20 group

C7

  • Used the logic similar to the above
    O4