ICP10: Data Frame and SQL - gabriellawillis/BigData GitHub Wiki
Lesson Overview:
- Data frames
- Construction of Data Frames
- SparkSQL
- Transformation
- Laziness
- Actions
- Basic Commands on Data frames
- Basic commands of SQL on Data frames
Source code can be found here: SourceCode
Part – 1
1.Import the dataset and create data frames directly on import.
2.Save data to file.
3.Check for Duplicate records in the dataset.
4.Apply Union operation on the dataset and order the output by CountryName alphabetically.
5.Use Groupby Query based on treatment.
#Part – 2
1.Apply the basic queries related to Joins and aggregate functions (at least 2)