Lab 1 Spark Sentence Count - meetsriharsha/RTBDA_5543 GitHub Wiki

Welcome to the RTBDA_5543 wiki!

08/31/2016

The main goal of this lab is to count the number of sentences in a given text file and then to sort these sentences.

I've created a text file with some sentences. (Same sentence can appear more than once). Next, I developed a scala application which takes this text file as input and processes it using Map-Reduce.

The processing steps include segregating the text file into RDDs of sentences and the count the number of times each sentence occurred. Later, the sentences will be sorted alphabetically and stored in output directory.

Output:

IntelliJ Output

Output Files