Deep_Learning_3 - SaranAkkiraju/Python_and_Deep_Learning_Programming_ICP GitHub Wiki
Objective:
- In the code provided there are three mistake which stop the code to get run successfully; find those mistakes and explain why they need to be corrected to be able to get the code run
- Add embedding layer to the model, did you experience any improvement?
- Apply the code on 20_newsgroup data set we worked in the previous classesfromsklearn.datasets importfetch_20newsgroupsnewsgroups_train =fetch_20newsgroups(subset='train', shuffle=True, categories=categories,)
The 3 mistakes in the code given are:
- Input dimensions should be vocab_size
- Output neurons should be 3 for positive, negative, neutral in the target column of dataset.
- Output layer activation function should be softmax as it works best for the multi class classification
Importing libraries
Reading data
Data Preprocessing
Modelling with adding embedded layers
- Compressed the dataset to 20000 records.
- Tokenizing the data and converting into the text to matrix form.
- Used the label encoder method to convert the text to digits and fit, transformed the data.
- Split data into train and test which is considering the 25% as the test data.
- Used the deep learning sequential model with 2 layers.
- 1st layer used is Embedded layer which finds out the meaning and captures the semantic relationships within the data. It follows the default word2vec algorithm which looks at the bigrams relationship among the data
- 2nd layer - No of neurons is 300 and activation function is relu.
- Output layer - No of neurons is 20 and activation function is softmax as the output layer
- No of the epoch is 5, Batch size is 256 and loss function used is sparse_categorical_crossentropy
- Accuracy is 62%
Loss
Accuracy
Reading News20 group
- Used the logic similar to the above