SPARK ICP 3 - Apoorvag2597/BDP_Revised GitHub Wiki

Name - Apoorva Geetanjali Avadhanula Class ID - 34 PART-1. To perform following executions on given dataset

Import the dataset and create data frames directly on import.

Save data to file.

Check for Duplicate records in the dataset.

Apply Union operation on the dataset and order the output by CountryName alphabetically.

Use Groupby Query based on treatment.

Tools used - IntelliJ (Scala)

Approach used- Schema-

this we are first importing and creating a dataset on the imported one. 2.We are saving the file.

3.we are finding if there are any duplicates present in the dataset. As there are no duplicates present, the output will be an empty list.

Union Function- For this, we are first creating two dataframes with limit of 8 and 10. Output -

On this, we are performing the union operation. The output is given alphabetically by Country name.

Output-

For OrderBy Country Name Output-

GroupBy Query based on treatment-

Output-

Part 2- To perform following additional tasks on the dataset

Apply the basic queries related to Joins and aggregate functions (at least 2)

Write a query to fetch 13th Row in the dataset.

Join Function - This is used to join two dataframes. For the join data, we need two sets of data. And it should have a same column for join function to take place. Here, we are joining two sets data and info. Output-

Inner Join-

Output-

1b. Aggegreate of Max and Average age column - Aggegreate functions are nothing but sum, difference, maximum, minimum, average. Here we are calculating average function on Age column

Maximum of Age-

Average of Age-

Query to fetch 13th row in the dataset - .last is a function that is used to display last rows in the dataset

Output -

Bonus Point - A parseLine method is used to split the comma-delimited row and create a Data frame In ParseLine method, three columns of the dataset are taken and passing them as comma-delimited. Then by using map transformation and parsing to 'toDF' to display as dataframe.

Output-

Add a cust