Lab Assignment 3 - VineethaG/Python_CS5590 GitHub Wiki

Python Lab Assignment-3

Team ID: 12
Team Member 1: Vineetha Gummadi, Class ID: 10
Team Member 2: Amulya Kasaraneni, Class ID: 14

Objective:

  • To implement the Logistic Regression using Tensor board and observe the loss & accuracy by changing the hyperparameters
  • To implement Word Embeddings and observe the average loss by changing the hyperparameters
  • To plot the tensor graph in TensorBoard

Task 1: Logistic Regression-Tensorflow:

Code Snippet:

Approach:

We considered the csv dataset with two labels ‘Iris-setosa’, ‘Iris-versicolor’ and has four features. Defined the placeholders initially and feed the data in session run. Defined weights with [4,1] dimension. Since it has two classes it’s a binary logistic regression. To map predicted values to probabilities, we use the sigmoid function and we use sigmoid cross-entropy cost function. To minimize the loss, we used GradientDescentOptimizer. We segmented the data for training and testing. Finally calculated the loss and accuracy of test data for each iteration.

Output for set: Learning rate: 0.3, batch size = 50

Graph: number of iterations vs Accuracy:

Observations:

Hyperparameters Sets:

Set 1:
Learning Rate: 0.01, Number of iteration = 250
Outputs: 
epoch:  245 loss: 0.803493 train_acc: 0.187500 test_acc: 0.150000
epoch:  246 loss: 0.788997 train_acc: 0.187500 test_acc: 0.150000
epoch:  247 loss: 0.759049 train_acc: 0.187500 test_acc: 0.150000
epoch:  248 loss: 0.759992 train_acc: 0.187500 test_acc: 0.150000
epoch:  249 loss: 0.746343 train_acc: 0.187500 test_acc: 0.150000
epoch:  250 loss: 0.768922 train_acc: 0.187500 test_acc: 0.150000

Set 2: 
Learning Rate: 0.5, Number of iteration = 250
epoch:  250 loss: 0.046807 train_acc: 1.000000 test_acc: 1.000000

Set 3:
Learning Rate:0.2
epoch:  250 loss: 0.114095 train_acc: 1.000000 test_acc: 1.000000

Set 4:
Learning Rate:0.1
epoch:  250 loss: 0.216514 train_acc: 1.000000 test_acc: 1.000000

We observed that with increase in the learning rate loss has decreased and accuracy increased.

Tensor Graph:

Task 2:Word Embedding

Code Snippet:

Approach:

We have taken English Wikipedia dataset enwik9 (http://mattmahoney.net/dc/enwik9.zip) of size 1,000,000,000 bytes. In the above code we considered only 10000 bytes. We have calculated the Average loss for different step sizes, Learning rate, window size, embedding size, vocabulary size, number of iterations.

Observations:

Hyperparameters Sets:

Set 1: 

 Average loss at step  100 :  36.610012084960935
Average loss at step  200 :  88.56226031494141
Average loss at step  300 :  124.65224085998535
Average loss at step  400 :  163.37515657806395

Set 2: 

 Average loss at step  100 :  14.58709525680542
Average loss at step  200 :  16.726716236114502
Average loss at step  300 :  30.201695640563965
Average loss at step  400 :  30.05541326522827

Set 3: 

Average loss at step  0 :  160.82130432128906
Average loss at step  1000 :  82.98428983032703
Average loss at step  2000 :  42.32238863015175
Average loss at step  3000 :  27.4516791870594
Average loss at step  4000 :  26.50844745385647

Set 4: 

 Average loss at step  0 :  214.52487182617188
Average loss at step  1000 :  117.42454052257538
Average loss at step  2000 :  67.43449500823021
Average loss at step  3000 :  43.06332559394836

We observed that with increase in vocabulary size loss is increasing. With Vocabulary size: 10000, loss is 43.06 whereas with the same set of hyper parameters with vocabulary size: 5000, loss is 26.5. With increase in learning rate and increase in window size the loss is increasing.

Tensor Graph:

Datasets:

http://mattmahoney.net/dc/enwik9.zip

Conclusion:

Therefore, we have observed the outputs by changing the hyper parameters for both logistic regression and word embedding. Plotted the tensor flow graph in tensor board.

References:

https://www.kaggle.com/autuanliuyc/logistic-regression-with-tensorflow

href="http://mattmahoney.net/dc/textdata.html

https://stackoverflow.com

⚠️ **GitHub.com Fallback** ⚠️