ICP3 - GeoSnipes/Big-Data GitHub Wiki

Sub-Team Members Class ID: 5-2 15 Naga Venkata Satya Pranoop Mutha 5-2 23 Geovanni West

This ICP is related to get familiar with concepts of Linear Regression, Supervised Learning and Unsupervised Learning, and Clustering of Data. In this ICP, we take 3D Road Network data and apply Linear Regression fit to it. Then we observe the Training Mean Square Error and Test Mean Square Error.

Linear Regression:

Input Data:

Then we load and parse the data

Then we build the model

Now, we evaluate training mean square error and test mean square error

Next we save and load the result into file

Results

Training Mean Squared Error = 1.95062302120918E16 Test Mean Squared Error = 1.9596847563029088E16

K - Means Clustering

Case 1: K = 3

Source Code:

Outlier Point:

  • Within Set Sum of Squared Errors = 8.58246791862488E14

Case 2 : K = 4

Source Code:

Outlier Point 1:

Outlier Point 2:

  • Within Set Sum of Squared Errors = 2.031178299721398E14