ICP 7 - PallaviArikatla/Big-Data-Programming GitHub Wiki

OBJECTIVE: To perform queries on Cassandra.

STEPS OF IMPLEMENTATION:

  • Check for the software and install Python 2.7.17, Java 8, Apache thrift and Cassandra.

  • Add environmental variables for Python, Java and Cassandra.

  • Go to the directory where Cassandra extracted folder is located, open command prompt and run the server with the following command:

cassandra.bat -f

  • Go to bin directory and run command prompt as administrator. Type the following command to connect to the server.

bin> cqlsh

  • With this we will be in Cassandra in the terminal where we can write the queries.

  • Create KeySpace with the following command:

CREATE KEYSPACE bdp7 WITH REPLICATION={'class':'SimpleStrategy','replication_factor':3};

Use the keyspace:

use bdp7;

  • Create table in Cassandra with table name as employee

create table employee(employee_id int primary key,department text,lastname text,years_with_company int,hiredate date,jobtitle text,salary int,managerid int);

  • Load the given dataset into the table as follows:

copy employee(employee_id,department,lastname,years_with_company,hiredate,jobtitle,salary,managerid) from 'C:\Users\Windows 10\Downloads\employee_entries.csv' with HEADER=true;

QUESTION 1:

List the empID,ename,jobtitle,and hiredate of employee from the employee table.

select * from employee;

Copied dataset content gets displayed.

QUESTION 2: List the name,salary of the employees who are clerks.

As the dataset doesn't contain any column on the name of clerks, create one.

select lastname,salary from employee where jobtitle='teamlead' allow filtering;

Insert into bdp7.employee(employee_id,department,lastname,years_with_company,hiredate,jobtitle,salary,managerid) values (9, 'Engineering', 'Emmie',3,'2013-08-11','clerk',65000,4);

select lastname,salary from employee where jobtitle='clerk' allow filtering;

Created clerk column and the loaded values gets displayed.

QUESTION 3: List the name,job,salary of every employee joined on ‘february18,2000’.

select lastname,jobtitle,salary from employee where hiredate='2000-02-18'allow filtering;

Employee details with hiredate as February 18, 2000 gets displayed.

QUESTION 4: List name and annual salary of all the employees.

select lastname,salary from employee;

Employees with in the dataset along with their salaries gets displayed.

QUESTION 5: Display employees’ names, salary and manager values of those employees whose salary is 45000 from employee table using SELECT statement.

Managerid and Name of the employee with the salary 45000 gets displayed.

BONUS QUESTION:

QUESTION 1: Write a query that displays last names with first letter capitalized and all other letters in lowercase and length of lastnames whose lastnames start with J, G or S.

Create custom index table with the following command:

CREATE CUSTOM INDEX firstname_idx ON employee (lastname) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 'false'};

Create a custom function to read length:

CREATE FUNCTION IF NOT EXISTS len (input text) CALLED ON NULL INPUT RETURNS int LANGUAGE java AS 'return input.length();';

select lastname from employee where lastname like 'H%';

select len(lastname) from employee where lastname like 'H%';

select lastname from employee where lastname like 'G%';

select len(lastname) from employee where lastname like 'G%';

QUESTION 2: Import any data from the given data set and apply any commands like above.

Type the following command:

CREATE FUNCTION IF NOT EXISTS displayname (column TEXT) RETURNS NULL ON NULL INPUT RETURNS text LANGUAGE javascript AS $$ column.charAt(0).toUpperCase() + column.slice(1) $$;

And to display all the last names from the employee table and to display names starting with a particular letter type the following command:

select displayname (lastname) from employee ;

select displayname (lastname) from employee where lastname like 'H%';

select displayname (lastname) from employee where lastname like 'G%';