Lab Assignment 2 - sirisha1206/Spark GitHub Wiki
Name:Naga Sirisha Sunkara
Class ID:21
Team ID:5
Technical partners details:
Name:Vinay Santhosham
Class ID:17
Objective:
Task1:Perform Queries on the hive table with the given dataset
Task2:Perform Queries on the solr with the given dataset
To perform the given tasks we have chosen super heroes dataset.
Task1:
Creation of table:
Query1:To get the number of super heroes in each category gender
Query2:To get the average,maximum height and weight in the given dataset
Query3:To get the number of super heroes with the weight > 90
Query4:To get the name and height of super heroes list in descending order of height
Query5:To get the distinct publishers in super heroes dataset
Query6:To get the name and substring from 1 to 5 characters in publisher dataset
Query7:To get the id,name,publisher in the order of name
Query8:To get the list of super heroes with both hair and eye color as brown
Query9:To get the sum of weights grouped by the category eye color
Query10:To get the name and concatenate the gender and race in the given dataset.
Task 2:
Steps to be followed before loading the dataset: