ICP 13 - manaswinivedula/Big-Data-Programming GitHub Wiki

1. Reading the CSV files and creating the data frames

Creating Vertices and edges from the data frames and creating a graph from those vertices and edges.

The following is the source code for creating and displaying the vertices, edges, and graph

2.Calculating the Triangle count

First, Computes set of neighbors for each vertex. For each edge, it computes the intersection of sets and sends the count to both vertices. Then Compute the sum at each vertex and then we divide by two since each triangle is counted twice. In this below code, we have used triangle count.run() to remove if any duplicates or any self edges are there.
The following is the source code of the Triangle count

3.Shortest Path

Here the shortest path is calculated using the Dijkstra's algorithm. from each id to the destination the weight of edges and vertices is shown.
The following is the source code

4.Page rank

PageRank probability is given by counting the number and quality which determines the importance of the website.
The following is the source code page rank for the vertices and weights of edges

5. Saving the generated graph

2. Label Propagation Algorithm

It is a semi-supervised Machine learning algorithm that is used to label the unlabeled dataset. Initially, some part of data is being labeled and the remaining data find labels from the labeled data labeled by dividing them into groups.
The following is the source code

2. BFS algorithm

The BFS algorithm finds the shortest distance between the two nodes in a graph. It uses a stack data structure and looks at whether the adjacent nodes visited or not.
The following is the source code for BFS