Graph Frames 1 - praveenpoluri/Big-Data-Programing GitHub Wiki

Aim:

Creating a Graphframe on Dataframe created on a csv file and Creating vertices, Edges and finding the degree, vertex in-degree, vertex out-degree.

Introduction:

About GraphFrames:

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs. It provides high-level APIs in Scala, Java, and Python. It aims to provide both the functionality of GraphX and extended functionality taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.

What is a Graphframe ?

GraphX is to RDDs as GraphFrames are to DataFrames. GraphFrames represent graphs: vertices (e.g., users) and edges (e.g., relationships between users). If you are familiar with GraphX, then GraphFrames will be easy to learn. The key difference is that GraphFrames are based upon Spark DataFrames, rather than RDDs. GraphFrames also provide powerful tools for running queries and standard graph algorithms. With GraphFrames, you can easily search for patterns within graphs, find important vertices, and more. Refer to the User Guide for a full list of queries and algorithms.

GraphFrames make it easy to express queries over graphs. Since GraphFrame vertices and edges are stored as DataFrames, many queries are just DataFrame (or SQL) queries.

Examples of Graph frames: Example: How many users in our social network have “age” > 35? We can query the vertices DataFrame: g.vertices.filter("age > 35")

Example: How many users have at least 2 followers? We can combine the built-in inDegrees method with a DataFrame query. g.inDegrees.filter("inDegree >= 2")

Tools:

  • Pycharm
  • Python
  • Spark
  • GraphX
  • Graphframes.

Implementation of Graph frames:

  • Importing all the required dependencies for graphframes and dataframes.

  • Creating SparkContext:

  • Importing csv and building dataframes on them as shown and creating a temporary table view.

*Viewing the created dataframes.

  • Concatenate chunks into list and convert to Dataframe.

  • Removing duplicates.

  • Naming columns and output dataframe and Creating vertices and showing some of the vertices.

  • Creating edges and showing:

  • Finding Vertex in-Degree.

  • Finding Vertex out-degree.

  • Applying the motif findings.

Limitations:

  • API not supported for most languages.
  • Motifs are not allowed to contain edges without any named elements: "()-[]->()" .

Conclusion:

Applied Graphframes on dataframes, creating vertices, Finding in-degree, out-degree etc.

References: