Spark ICP3 - neerajpadarthi/Big-Data-Programming GitHub Wiki

Name : Neeraj Padarthi

Class ID: 19

Spark ICP : 3

Topic: Working with Data Frame and SQL

Import the dataset and create data frames directly on import.

Save data to file

Checking for Duplicate Records

Apply Union operation on the datasets

Use Groupby Query based on treatment

Joining Data Sets

Using Aggregrate Function

Write a query to fetch 13th Row in the dataset

Bonus - Using ParseLine Function