ICP 8 - Gnkhakimova/CS5590-BigData GitHub Wiki
ICP 8
Installation / IntelliJ / Spark
Tasks
- Install IntelliJand Apache Spark
- Perform word count, secondary sorting and char count using Scala
Configuration
- Linux Mint
- IntelliJ
- Apache Spark
Features
In this ICP 8 we used IntelliJ IDE to complete task, which required to perform Word Count, Secondary Count and Char Count.
Word Count
For word count example I am reading in input file and storing it inside SparkContext. Splitting line my space and mapping each word inside Map. After that I am performing Reduce part which counts number of words.
Input file:
Output file:
Secondary Sorting
For secondary sorting I am reading in file into SparkContext. Separate file by comma and map two values. After that we need to group it by key and map it.
Input file:
Output files:
Bonus - Char count
Perform same task as word count but count characters. Splitting input by char.
Output file:
Limitations
- Had to include packages to built.sbt