GraphX in Spark - awantik/spark GitHub Wiki

###What is a graph?

Fantastic way to represent information
They are not line charts but network of interconnected devices
The cool thing here is, they represent both tangible & abstract things.
Example of social network, each person is connected by some sort of relationship be it friendship or something.
Image each person in social network as an RDD.
DAG - Directed Acyclic Graph

Nodes Vertices - They are typical things like people or places Edges - They are lines that connect the nodes/vertices Weights - Some sort of strength to the edge Directed - They have direction Undirected - They don't have any direction info - friendship. Cyclic - You have multiple paths to reach nodes Acyclic - Cannot reach starting point

###Basics of GraphX

Built around graph theory
Provides Spark API for graphs - web-graphs & social networks
Provides Spark API for graph-parallel computation - PageRank, Recommandation
GraphX extends the Spark RDD abstraction using Resilient Distributed Property Graph - Directed multigraph with properties to each vertex & edge.
GraphX supports fundamental operators like subgraph, joinVertices, mapReduceTriplets.
GraphX library is a collection of algorithms for graph analytics.

###Data Parallel - Hadoop & Spark, They break down the entire data int blocks. And, parallel computation is happening on all the blocks.

###Graph Parallel Computation - Things like social networking etc has driven the development of numerous new graph parallel system.

Raw Data </> -> Creation of initial Graph (ETL) -> (Slice) the graph here, creation of subgraph -> Compute the nodes/vertices ( compute pagerank ) -> Analyze ( Using HIVE, find top users ) -> Repeat stages from ETL

** The vision of GraphX project is to unify data-parallel & graph-parallel computation resulting a single API. **

GraphX API

The Property Graph

Directed multigraph - A directed graph with multiple parallel edges sharing the same source destination vertex.
Each vertex is keyed by a unique 64-bit long identifier

GraphX in Spark - awantik/spark GitHub Wiki

GraphX API

The Property Graph

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️