ICP 13 - manaswinivedula/Big-Data-Programming GitHub Wiki
Spark and spark-x frameworks
Initial configurations
- Initially the following configurations are added to build.sbt file
Task
1. Reading the CSV files and creating the data frames
- The following is the source code for creating and displaying the output
- The following is the output
Creating Vertices and edges from the data frames and creating a graph from those vertices and edges.
- The following is the source code for creating and displaying the vertices, edges, and graph
- The following is the output
2.Calculating the Triangle count
-
First, Computes set of neighbors for each vertex. For each edge, it computes the intersection of sets and sends the count to both vertices. Then Compute the sum at each vertex and then we divide by two since each triangle is counted twice. In this below code, we have used triangle count.run() to remove if any duplicates or any self edges are there.
-
The following is the source code of the Triangle count
- The following is the output
3.Shortest Path
-
Here the shortest path is calculated using the Dijkstra's algorithm. from each id to the destination the weight of edges and vertices is shown.
-
The following is the source code
- The following is the output
4.Page rank
-
PageRank probability is given by counting the number and quality which determines the importance of the website.
-
The following is the source code page rank for the vertices and weights of edges
- The following is the output of vertices page rank
- The following is the output of weights of edges
5. Saving the generated graph
- The following is the source code for saving the graph.
- The following is the output
Bonus
2. Label Propagation Algorithm
-
It is a semi-supervised Machine learning algorithm that is used to label the unlabeled dataset. Initially, some part of data is being labeled and the remaining data find labels from the labeled data labeled by dividing them into groups.
-
The following is the source code
- The following is the output.
2. BFS algorithm
-
The BFS algorithm finds the shortest distance between the two nodes in a graph. It uses a stack data structure and looks at whether the adjacent nodes visited or not.
-
The following is the source code for BFS
- The following is the output
References
1.https://spark.apache.org/docs/latest/graphx-programming-guide.html
2.https://www.edureka.co/blog/spark-graphx/
3.https://databricks.com/blog/2016/03/03/introducing-graphframes.html