Lab Assignment 2 - sirisha1206/Spark GitHub Wiki

Name:Naga Sirisha Sunkara

Class ID:21

Team ID:5

Technical partners details:

Name:Vinay Santhosham

Class ID:17

Objective:

Task1:Perform Queries on the hive table with the given dataset

Task2:Perform Queries on the solr with the given dataset

To perform the given tasks we have chosen super heroes dataset.

Task1:

Creation of table:

Query1:To get the number of super heroes in each category gender

Query2:To get the average,maximum height and weight in the given dataset

Query3:To get the number of super heroes with the weight > 90

Query4:To get the name and height of super heroes list in descending order of height

Query5:To get the distinct publishers in super heroes dataset

Query6:To get the name and substring from 1 to 5 characters in publisher dataset

Query7:To get the id,name,publisher in the order of name

Query8:To get the list of super heroes with both hair and eye color as brown

Query9:To get the sum of weights grouped by the category eye color

Query10:To get the name and concatenate the gender and race in the given dataset.

Task 2:

Steps to be followed before loading the dataset:

Query1:To get the details of super heroes with race as human and publisher as wildstorm

Query2:To get the details of super heroes with publisher name starting with im and ending with cs

Query3:Proximity search implementation

Query4:To get the list with weight range in between 150 to 300

Query5:To get the list with publisher as Image Comics and not DC comics

Query6:To get the list where alignment is bad

Query7:Sorting the id in descending order

Query8:To filter the Publisher in first 20 rows

Query9:List of super heroes with weight from 0-50 or 100-150

Query10:Boosting with ^