Lab assignment 3 - naveenanallamotu/Big-Data-Analytics-Lab-Assignments GitHub Wiki

The project is to calculate the Kmeans for given clusters.

$$K-means:k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

Input: Given datasets in the text file

output: Clusters formed based on the distance from the centroid. Within Set Sum of Squared Errors = 23.02821428571435 Clustering on training data: ([0.0,0.0,0.0],0) ([0.1,0.1,0.1],0) ([0.2,0.2,0.2],0) ([9.0,9.0,9.0],1) ([9.1,9.1,9.1],1) ([9.2,9.2,9.2],1) ([0.8,0.8,0.8],0) ([0.2,0.2,0.2],0) ([0.4,0.4,0.4],0) ([0.5,0.5,0.5],0) ([6.0,6.0,6.0],1)

$$Linear regression: It gives a relationship between the dependent and independent variable.

Input: .data file It contains the information about some trained data like movement the animals and their eating habits

output: what kind of activities chimpanzees are doing at a specific time. training Mean Squared Error = 7.451165849960048 test Mean Squared Error = 7.448194517034248

$$video annotation: It divides the videos into a set of frames and after analyzing, group the set of the frame into the cluster based on the trained data.

Input: Given the video in .mkv format

Output: set of frames and main frames