GraphX计算 - Jeffrey511/Jeffrey-Yu GitHub Wiki

案例/数据源/链接/文件名/重命名文件 1.PageRank YouTube https://snap.stanford.edu/data/com-Youtube.html com-youtube.ungraph.txt page-rank-yt-data.txt

2.Connected Components LiveJournal https://snap.stanford.edu/data/com-LiveJournal.html com-lj.ungraph.txt connected-components-lj-data.txt

3.Triangle Count Facebook https://snap.stanford.edu/data/egonets-Facebook.html facebook_combined.txt triangle-count-fb-data.txt

首先,我们在YouTube在线社交网络数据上运行PageRank。该数据集包括了真实的社区信息,基本上是用户所定义的其他用户可加入的群组。

import org.apache.spark._ import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD import java.util.Calendar

val graph = GraphLoader.edgeListFile(sc, "file:///root/page-rank-yt-data.txt")

val vertexCount = graph.numVertices

val vertices = graph.vertices vertices.count()

val edgeCount = graph.numEdges

val edges = graph.edges edges.count()

val triplets = graph.triplets triplets.count() triplets.take(5)

val inDegrees = graph.inDegrees inDegrees.collect()

val outDegrees = graph.outDegrees outDegrees.collect()

val degrees = graph.degrees degrees.collect()

val staticPageRank = graph.staticPageRank(10) staticPageRank.vertices.collect()

Calendar.getInstance().getTime() val pageRank = graph.pageRank(0.001).vertices Calendar.getInstance().getTime()

println(pageRank.top(5).mkString("\n"))

下面我们来看看在LiveJournal的社交网络数据上运行Connected Components的代码。该数据集包括在网站上注册并有个人和群组博客帖子的用户。该网站还允许用户识别朋友用户。

import org.apache.spark._ import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD import java.util.Calendar

val graph = GraphLoader.edgeListFile(sc, "data/connected-components-lj-data.txt")

Calendar.getInstance().getTime() val cc = graph.connectedComponents() Calendar.getInstance().getTime()

cc.vertices.collect()

println(cc.vertices.take(5).mkString("\n"))

val scc = graph.stronglyConnectedComponents() scc.vertices.collect()

最后是在Facebook的社交圈数据上计算Triangle Counting的Spark程序,依旧用的Scala。该数据集包括Facebook上的朋友列表,信息包括user profiles,circles和ego networks。

import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD

val graph = GraphLoader.edgeListFile(sc,"file:///triangle-count-fb-data.txt")

println("Number of vertices : " + graph.vertices.count()) println("Number of edges : " + graph.edges.count())

graph.vertices.foreach(v ⇒ println(v))

val tc = graph.triangleCount()

tc.vertices.collect

println("tc: " + tc.vertices.take(5).mkString("\n"));

println("Triangle counts: " + graph.connectedComponents.triangleCount().vertices.top(5).mkString("\n"));

val sum = tc.vertices.map(a ⇒ a._2).reduce((a, b) ⇒ a + b)

⚠️ **GitHub.com Fallback** ⚠️