M2 ICP 7 - PavankumarManchala/BigDataProgrammingICPs GitHub Wiki
Submitted By:
Pavankumar Manchala
Class Id: 16
Tasks:
- Perform Naive Bayes classification on adult dataset.
Naive Bayes performed based on Bayes theorem, which states the probability of classes is the posterior probabilities. The maximum posterior probability value decides the target class.
Output:
- Perform Decision tree on adult dataset.
Decision tree is the classification performed by combination of many conditional control statements. The conditions stated in the nodes and starts at root node and ends up to leaf nodes.
Output:
- Perform Random forest on adult dataset.
Random forest is the combination of multiple decision trees. The output is calculated based on the all outputs generated by decision trees, the output may be maximum or average of the ouputs.
Output:
- Perform the K-means clustering on diabetic dataset.
K-means is the clustering model to predict the cluster classes based on the distances(eg:manhattan, euclidian)
Output:
- Perform the Linear regression over the car dataset.
Linear regression is linear model fit on the training dataset to predict the output values to the test dataset. Linear model fits the linear curves over the dataset.
Output:
- Perform Logistic regression on car dataset.
Logistic regression is another regression model used for classification as well. In logistic regression the curve fits the data in range of 0 to 1. Thus the values lies mostly near 0 or 1. So, it performs classification and it is similar to sigmoid function.
Output:
SourceCode: https://github.com/PavankumarManchala/BigDataProgrammingICPs/tree/master/Spark/ICP_7/SourceCode
Outputs: https://github.com/PavankumarManchala/BigDataProgrammingICPs/tree/master/Spark/ICP_7/Documentation
M2 ICP7 Video Explanation: https://github.com/PavankumarManchala/BigDataProgrammingICPs/blob/master/Spark/ICP_7/ICP7.mp4
Drive link for video explanation: https://drive.google.com/open?id=1zPA-aZbEsXBgRxI9fOm4NdAOW3Pfr8cw
All ICPs videos: https://drive.google.com/open?id=1racqWkfI10T-CpLYEDYCvJRSRhhLGsWL