ICP 10 - Gnkhakimova/CS5590-BigData GitHub Wiki
ICP 10
Data Frame and SQL
Task 1
- Import the dataset and create data frames directly on import
- Save data to file
Saved file - Check for Duplicate records in the dataset
- Apply Union operation on the dataset and order the output by Country Name alphabetically
- Use Groupby Query based on treatment
Task 2
- Apply the basic queries related to Joins and aggregate functions (at least 2)
- Write a query to fetch 13th Row in the dataset.
Bonus
- Write a parseLine method to split the comma-delimited row and create a Data frame.