ICP 10 - Gnkhakimova/CS5590-BigData GitHub Wiki

ICP 10

Data Frame and SQL

Source Code

Task 1

  • Import the dataset and create data frames directly on import

  • Save data to file

    Saved file
  • Check for Duplicate records in the dataset

  • Apply Union operation on the dataset and order the output by Country Name alphabetically



  • Use Groupby Query based on treatment

Task 2

  • Apply the basic queries related to Joins and aggregate functions (at least 2)


  • Write a query to fetch 13th Row in the dataset.

Bonus

  • Write a parseLine method to split the comma-delimited row and create a Data frame.