ICP 8 - Gnkhakimova/CS5590-BigData GitHub Wiki

ICP 8

Source Code

Installation / IntelliJ / Spark

Tasks

  1. Install IntelliJand Apache Spark
  2. Perform word count, secondary sorting and char count using Scala

Configuration

  • Linux Mint
  • IntelliJ
  • Apache Spark

Features

In this ICP 8 we used IntelliJ IDE to complete task, which required to perform Word Count, Secondary Count and Char Count.

Word Count

For word count example I am reading in input file and storing it inside SparkContext. Splitting line my space and mapping each word inside Map. After that I am performing Reduce part which counts number of words.

Input file:

Output file:

Secondary Sorting

For secondary sorting I am reading in file into SparkContext. Separate file by comma and map two values. After that we need to group it by key and map it.

Input file:

Output files:

Bonus - Char count

Perform same task as word count but count characters. Splitting input by char.

Output file:

Limitations

  • Had to include packages to built.sbt

References

  1. https://data-flair.training/blogs/scala-spark-shell-commands/
  2. https://spark.apache.org/docs/2.2.0/programming-guide.html