ICP 9

`Objective:`

Write a Tensor Flowprogram for the following Task.Implement CNN model for the given MNIST data.

Train CNN model using different optimizers and compare their results.
Change the configuration of the CNN model. Use filter size 16,36 and 128 . Report the results.
Show the tensor board of the tasks.

Convolution Neural Networks:

Convolutional Neural Netowrks use 3 basic ideas namely Local Receptive Fields, Shared Weights and Pooling.

Receptive Field:

It is defined as the region in the input space that a particular CNN's feature is looking at (i.e. being affected by). Its field of feature can be fully described by its center location and its size. Convolution layers contain several different feature maps. As of now we are comparing with two types of feature maps. The first one is 32 and the other is 16.

32 - feature map:

16 feature map and defining strides:

Shared Weights and Biases:

These define kernel or filter. These helps in greatly reducing the number of parameters involved. We initialize the weights with noise for symmetry breaking and to prevent 0 gradients.

Pooling Layers:

Pooling layers are usually used immediately after convolution layers. These layers simplify the information in the output from the convolution layers. This layer takes each feature map output from the convolution layer and prepares a condensed feature map

Dropout:

Dropout is a form of regularization to avoid overfitting. It is a form of ensemble learning. This creates a different network, which is trained using back propagation as usual.

`Source Code Explanation:`

The work flow goes in this way.

Step 1: Learn about the MNIST dataset

Step 2: Read, load, parse dataset

Step 3: Build CNN Model

Step 4: Train the model

Step 5: Evaluate the model

After evaluating the model, the results and summary are as follows:

When comparing the two optimizer, Adam method has the better accuracy . We can also see that as the number of features increases, the accuracy also increases. This is becuase, AdaGrad or adaptive gradient allows the learning rate to adapt based on parameters. It performs larger updates for infrequent parameters and smaller updates for frequent one. Because of this it is well suited for sparse data (NLP or image recognition). Another advantage is that it basically eliminates the need to tune the learning rate. Each parameter has its own learning rate and due to the peculiarities of the algorithm the learning rate is monotonically decreasing. This causes the biggest problem: at some point of time the learning rate is so small that the system stops learning.