ICP2 - GeoSnipes/Big-Data GitHub Wiki

Sub-Team Members Class ID: 5-2 15 Naga Venkata Satya Pranoop Mutha 5-2 23 Geovanni West

Spark Transformations and Actions:

The below transformations and actions are being used by us in this lab assignment.

map(func) : Return a new distributed dataset formed by passing each element of the source through a function func.
flatMap(func) : Similar to map, but each input item can be mapped to 0 or more output items (so func should return a Seq rather than a single item).
groupByKey(func,[numTasks]) : When called on a dataset of (K, V) pairs, returns a dataset of (K, Iterable) pairs. Note: If you are grouping in order to perform an aggregation (such as a sum or average) over each key, using reduceByKey or aggregateByKey will yield much better performance. Note: By default, the level of parallelism in the output depends on the number of partitions of the parent RDD. You can pass an optional numTasks argument to set a different number of tasks.
sortByKey([ascending], [numTasks]) : When called on a dataset of (K, V) pairs where K implements Ordered, returns a dataset of (K, V) pairs sorted by keys in ascending or descending order, as specified in the boolean ascending argument.
saveAsTextFile : rite the elements of the dataset as a text file (or set of text files) in a given directory in the local filesystem, HDFS or any other Hadoop-supported file system. Spark will call toString on each element to convert it to a line of text in the file.

So, now we got to know how we need to show the ouptut. We now try out with some random paragraph of input data.

Input:

Source Code:

Steps to reporduce the output:

1. First we load the TextFile into a variable file.
1. Then we split the text with blankspace parameter in such a way that all the words get separated and becomes an individual word.
1. Then we apply a flatMap transformation to the individual words.
1. Then we apply map transformation to the words as the first letter of each individual word and then the word. Eg: (K, Kangaroo)(K, Kingfisher)
1. Then we have done the groupByKey transformation and append them to a list. (K, Kangaraoo, Kingfisher)
1. Now we write this list into a text file using saveAsTextFile action.

Output