Spark ICP3 - neerajpadarthi/Big-Data-Programming GitHub Wiki
Name : Neeraj Padarthi
Class ID: 19
Spark ICP : 3
Topic: Working with Data Frame and SQL
Import the dataset and create data frames directly on import.

Save data to file

Checking for Duplicate Records


Apply Union operation on the datasets

Use Groupby Query based on treatment

Joining Data Sets

Using Aggregrate Function

Write a query to fetch 13th Row in the dataset


Bonus - Using ParseLine Function
