Module 2: ICP #5 - VidyullathaKaza/BigData_Programming_Spring2020 GitHub Wiki
Configurations/ steps followed:
1. Download first 3 packages in jar from below link.
https://spark-packages.org/package/graphframes/graphframes
Now copy and paste it in your spark jar folder.
2.Run below command for Scala
Spark-shell -–packages graphframes:graphframes:0.6.0-spark2.3-s_2.11
3.Run below command for Pyspark
Pyspark --packages graphframes:graphframes:0.7.0-spark2.4-s_2.11
We can get the scala version when we run : spark-shell command
4. I updated my build.sbt file with scala version and graphframes
5. Loaded given data, created data frames and executed few commands:
Task
1. Import the data set as a csv file and create data frames directly on import than create graph out of the data frame created.
2. Concatenate chunks into list & convert to DataFrame.
3. Remove Duplicates.
4. Create vertices, Name Columns and Output Dataframes.
5. Vertex in-Degree and out-Degree.
6. Apply the motif findings.
Outputs:
Bonus
My Learning Outcomes
In this ICP I learned a new concept "graph frames" in spark. Basic commands on creation of data frames.