Module 2: ICP #1 - SnehaMishra28/BigData_Programming_Summer2018 GitHub Wiki

Team: 12
Professor: Yugyung Lee

Name: Sneha Mishra
Class ID: 11
Email: [email protected]
MyGitHub

Technical Partner:
Name: Aditya Soman
Class ID: 19
Email: [email protected]
GitHub

Objective

Introduction to Apache Spark, it is a unified analytics engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.

Features

  1. Create a Map-Reduce Program to perform the task of matrix multiplication
  2. Create a Map-Reduce Program to perform Merge-Sort Algorithm in Spark
  3. Breadth First Search in Graph using Map-Reduce in Apache Spark

Steps:

Step 1: Spark Installation

Step 2: Testing the Application after installation

Testing Code:

Input given:

Output:

Step 3: Implement the given Use Cases

UseCase 1: Implement Spark Transformations and Spark Actions

Transformation Code:

WordCount Code:

Transformation Output:

WordCount Output:

Split Folder Structure:

Output:

UseCase 2: Matrix Multiplication

UseCase Bonus: Breadth First Search

References:

  1. https://hackernoon.com/pycharm-and-apache-spark-on-mac-os-x-990af6dc6f38
  2. https://umkc.app.box.com/s/0o6u9qhe8y3q49slzlcyl2cw0jnwchzh