ICP10: Data Frame and SQL - gabriellawillis/BigData GitHub Wiki

Lesson Overview:

  • Data frames
  • Construction of Data Frames
  • SparkSQL
  • Transformation
  • Laziness
  • Actions
  • Basic Commands on Data frames
  • Basic commands of SQL on Data frames

Source code can be found here: SourceCode

Part – 1

1.Import the dataset and create data frames directly on import.

2.Save data to file.

3.Check for Duplicate records in the dataset.

4.Apply Union operation on the dataset and order the output by CountryName alphabetically.

5.Use Groupby Query based on treatment.

#Part – 2

1.Apply the basic queries related to Joins and aggregate functions (at least 2)

2.Write a query to fetch 13th Row in the dataset.

References