Module 2: ICP #6 - VidyullathaKaza/BigData_Programming_Spring2020 GitHub Wiki

Aim: Distributed Collection of Data using Graph Frames and GraphX Algorithms.

Procedure:

Importing data And Giving out the schema.

We are using 3 data-sets. We are going to import them and display their schema.

Triangle count:

Triangle counting is a community detection graph algorithm that is used to determine the number of triangles passing through each node in the graph. A triangle is a set of three nodes, where each node has a relationship to all other nodes.

Shortest Path:

Page Rank:

PageRank is an algorithm that measures the transitive influence or connectivity of nodes. It can be computed by either iteratively distributing one node’s rank (originally based on degree) over its neighbors or by randomly traversing the graph and counting the frequency of hitting each node during these walks.

Saving the Graphs:

We use the following command for saving the graphs.