Module 2: ICP #1 - SnehaMishra28/BigData_Programming_Summer2018 GitHub Wiki
Team: 12
Professor: Yugyung Lee
Name: Sneha Mishra
Class ID: 11
Email: [email protected]
MyGitHub
Technical Partner:
Name: Aditya Soman
Class ID: 19
Email: [email protected]
GitHub
Objective
Introduction to Apache Spark, it is a unified analytics engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
Features
- Create a Map-Reduce Program to perform the task of matrix multiplication
- Create a Map-Reduce Program to perform Merge-Sort Algorithm in Spark
- Breadth First Search in Graph using Map-Reduce in Apache Spark
Steps:
Step 1: Spark Installation



Step 2: Testing the Application after installation
Testing Code:

Input given:

Output:


Step 3: Implement the given Use Cases
UseCase 1: Implement Spark Transformations and Spark Actions
Transformation Code:


WordCount Code:

Transformation Output:

WordCount Output:

Split Folder Structure:


Output:
