Frequently Asked Questions on R - Statistics-and-Machine-Learning-with-R/Statistical-Methods-and-Machine-Learning-in-R GitHub Wiki

Installation Questions:

What is the difference between R & RStudio?

R is an environment whereas RStudio is the application to write and run codes in an R environment

Do I need to download both R & RStudio to write codes?

Yes. WIthout R, RStudio would not work. And RStudio is the application to create projects in R. Here you can find details on how to download R & RStudio

Normalization Questions:

What Is Data Normalization, and Why Do We Need It?

The process of standardizing and reforming data is called “Data Normalization.” It’s a pre-processing step to eliminate data redundancy. Often, data comes in, and you get the same information in different formats. In these cases, you should rescale values to fit into a particular range, achieving better convergence.

Clustering Questions:

What are advantages and disadvantages of using K-means:

K-Means Clustering Algorithm offers the following advantages-

  • It is relatively efficient with time complexity
  • It often terminates at local optimum.
  • Techniques such as Simulated Annealing or Genetic Algorithms may be used to find the global optimum.

K-Means Clustering Algorithm has the following disadvantages-

  • It requires to specify the number of clusters (k) in advance.
  • It can not handle noisy data and outliers.
  • It is not suitable to identify clusters with non-convex shapes.

K-means: Can the centroids be incorrect even if there is convergence? How?

Yes, absolutely. K-Means clustering finishes once the centroids no longer update, and have converged, even if they are completely incorrect. There are several reasons as to why this can happen for K-Means clustering, which are given as follows.

One common reason is that the value chosen for k was not ideal. A value that is too low can cause centroids to include more than one cluster’s data points, and a value too large can cause clusters to be divided further among different centroids.

Furthermore, there can be incorrect assumptions made about the data. One wrong assumption is that the data is always clustered in spherical shapes, which is not always the case. The data can take on many different shapes, like a ring, or even a long rectangular shape. And, another incorrect assumption is that clusters are the same size, but, you may have some clusters that are much more dense than others or much larger in size.

Despite these possible reasons for having an incorrect result, K-Means clustering attempts to find a solution by repeating the entire process several times and then choosing the best result of those runs. However, this does not resolve all the possible issues, like a wrong choice of k.

What is difference between Hierarchical and K-means clustering?

  • Hierarchical clustering can’t handle big data well but K Means clustering can. This is because the time complexity of K Means is linear i.e. O(n) while that of hierarchical clustering is quadratic i.e. O(n2).
  • In K Means clustering, since we start with random choice of clusters, the results produced by running the algorithm multiple times might differ. While results are reproducible in Hierarchical clustering.
  • K Means is found to work well when the shape of the clusters is hyper spherical (like circle in 2D, sphere in 3D).
  • K Means clustering requires prior knowledge of K i.e. no. of clusters you want to divide your data into. But, you can stop at whatever number of clusters you find appropriate in hierarchical clustering by interpreting the dendrogram.

Source for DT questions: https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/

Artificial Neural Networks Questions

What Is the Role of Activation Functions in a Neural Network?

At the most basic level, an activation function decides whether a neuron should be fired or not. It accepts the weighted sum of the inputs and bias as input to any activation function. Step function, Sigmoid, ReLU, Tanh, and Softmax are examples of activation functions.                                               

What Is the Cost Function?

Also referred to as “loss” or “error,” cost function is a measure to evaluate how good your model’s performance is. It’s used to compute the error of the output layer during backpropagation. We push that error backward through the neural network and use that during the different training functions.

What Do You Understand by Backpropagation?

Backpropagation is a technique to improve the performance of the network. It backpropagates the error and updates the weights to reduce the error.

                                              

What Is the Difference Between a Feedforward Neural Network and Recurrent Neural Network?

A Feedforward Neural Network signals travel in one direction from input to output. There are no feedback loops; the network considers only the current input. It cannot memorize previous inputs

A Recurrent Neural Network’s signals travel in both directions, creating a looped network. It considers the current input with the previously received inputs for generating the output of a layer and can memorize past data due to its internal memory.

What Will Happen If the Learning Rate Is Set Too Low or Too High?

When your learning rate is too low, training of the model will progress very slowly as we are making minimal updates to the weights. It will take many updates before reaching the minimum point.

If the learning rate is set too high, this causes undesirable divergent behavior to the loss function due to drastic updates in weights. It may fail to converge (model can give a good output) or even diverge (data is too chaotic for the network to train).

What Is Dropout and Batch Normalization?

Dropout is a technique of dropping out hidden and visible units of a network randomly to prevent overfitting of data (typically dropping 20 percent of the nodes). It doubles the number of iterations needed to converge the network.

Batch normalization is the technique to improve the performance and stability of neural networks by normalizing the inputs in every layer so that they have mean output activation of zero and standard deviation of one.

How Are Weights Initialized in a Network?

There are two methods here: we can either initialize the weights to zero or assign them randomly.

Initializing all weights to 0: This makes your model similar to a linear model. All the neurons and every layer perform the same operation, giving the same output and making the deep net useless.

Initializing all weights randomly: Here, the weights are assigned randomly by initializing them very close to 0. It gives better accuracy to the model since every neuron performs different computations. This is the most commonly used method.

Source for ANN questions: https://www.simplilearn.com/tutorials/deep-learning-tutorial/deep-learning-interview-questions