Module 2: ICP #5 - VidyullathaKaza/BigData_Programming_Spring2020 GitHub Wiki

Configurations/ steps followed:

1. Download first 3 packages in jar from below link.

https://spark-packages.org/package/graphframes/graphframes

Now copy and paste it in your spark jar folder.

2.Run below command for Scala

Spark-shell -–packages graphframes:graphframes:0.6.0-spark2.3-s_2.11

3.Run below command for Pyspark

Pyspark --packages graphframes:graphframes:0.7.0-spark2.4-s_2.11

We can get the scala version when we run : spark-shell command

4. I updated my build.sbt file with scala version and graphframes

5. Loaded given data, created data frames and executed few commands:

Task

1. Import the data set as a csv file and create data frames directly on import than create graph out of the data frame created.

2. Concatenate chunks into list & convert to DataFrame.

3. Remove Duplicates.

4. Create vertices, Name Columns and Output Dataframes.

5. Vertex in-Degree and out-Degree.

6. Apply the motif findings.

Outputs:

Bonus

My Learning Outcomes

In this ICP I learned a new concept "graph frames" in spark. Basic commands on creation of data frames.