ICP 7 - Murarishetti-Shiva-Kumar/Big-Data-Programming GitHub Wiki

Lesson Plan7: Independent Column based No SQL Tool - Cassandra

Steps for installation of cassandra

1.Download and install Python 2.7.X(https://www.python.org/downloads/release/python-2717/)

2.Download Cassandra Version 3.11.10 or latest (http://cassandra.apache.org/download/)

3.Extract it in your local system

4.Add environment variable CASSNADRA_HOME with value "C:\apache-cassandra-3.11.10"

5.Navigate to "C:\apache-cassandra-3.11.10\bin" folder

6.Open command terminal window as administrator, navigate to "C:\apache-cassandra-3.11.10\bin" and Type "cassandra.bat -f" for starting Cassandra server on localhost

7.Wait for server to initialize completely

8.Open new terminal windows navigate to "C:\apache-cassandra-3.11.10\bin" and type "cqlsh"

9.Open Cassandra Shell

Creation of Keyspace

1.Create the keyspace in Cassandra using the following command.

CREATE KEYSPACE icp7 WITH REPLICATION={'class':'SimpleStrategy','replication_factor':3};

Here SimpleStrategy is replication strategy and replication_factor is 3. We are using the simple strategy as network strategy because all the types being used here are simple types. The replication factor is 3 which indicates the copies of data.

Replication factor plays a key role in fault tolerance. During write operation the Coordinator sends a write request to the nodes and the default node will be the node selected in the partition. Incase if there is breakdown in any of the nodes or the default node the data will be acknowledged by the any one of the other two replicas since the replication factor is 3.

image

2.Now check whether the keyspace is created or not by listing all the keyspaces using the following command.

Desc keyspaces;

image

3.After creating Keyspace you have to use the key space.

use icp7;

image

1. List the empID,ename,jobtitle,and hiredate of employee from the employee table.

Create the table “employee”

create table employee(employee_id int primary key,department text,lastname text,years_with_company int,hiredate date,jobtitle text,salary int,managerid int);

Checking the table structure using

desc table employee;

image

Loading data into the employee table

Copy employee(employee_id,department,lastname,years_with_company,hiredate,jobtitle,salary,managerid) from 'C:\Users\mshiv\Downloads\employee_entries.csv' with HEADER=true;

image

Now, we shall display the details of employee from the table

select * from employee;

image

select employee_id,lastname,hiredate,jobtitle from employee;

image

2. List the name,salary of the employees who are clerks.

Since the role clerk is not present in the table i am inserting the record manually.

Insert into icp7.employee(employee_id, department,lastname,years_with_company,hiredate,jobtitle,salary,managerid) values (9, 'Engineering', 'Paul',3,'2018-08-23','clerk',50000,3);

image

select lastname,salary from employee where jobtitle='clerk' allow filtering;

image

3. List the name,job,salary of every employee joined on ‘february 18,2000’.

select lastname,jobtitle,salary from employee where hiredate='2000-02-18' allow filtering;

image

4. List name and annual salary of all the employees.

select lastname,salary from employee;

image

5. Display employees’ names, salary and manager values of those employees whose salary is 45000 from EMP table using SELECT statement.

select lastname,salary,managerid from employee where salary=45000 allow filtering;

image

Bonus Question

1. Write a query that displays last names with first letter capitalized and all other letters in lowercase and length of lastnames whose lastnames start with J, G or S.

Step 1: Enable user defined functions=true;(for java) enable_scripted_user_defined_functions:true;(for javascript) in location\apache-cassandra-3.11.10\conf\cassandra.yaml file. Initially this would be false by default, this option allows the user to write their own functions in cqlsh.

image

image

Step 2: After doing these changes use the “run cassandra.bat -f” command to restart the server again.(In this case close the previous running server.)

Step 3: Since lastname is not the primary key of the table and since the table is already created create a custom index inorder to access the table details with the lastname

Step 4: The custom index can be generated using the following command.

Here the pattern given is with the people whose name starts with a pattern. So the mode is “prefix”. If the pattern is at the end it should be “contains”. CREATE CUSTOM INDEX firstname_idx ON employee (lastname) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 'false'};

image

Describing the table

image

Step 5: Incase if the index is created by mistake we can drop it.

Drop index IF EXISTS icp7.firstname_idx;

Step 6: Now we shall create a custom function displayname using the javascript function using the following command.

CREATE FUNCTION IF NOT EXISTS displayname (column TEXT) RETURNS NULL ON NULL INPUT RETURNS text LANGUAGE javascript AS $$ column.charAt(0).toUpperCase() + column.slice(1) $$;

image

Display the data using the created function

select displayname (lastname) from employee ;

image

Now we shall display the data of the people whose name starts with “s”.

select displayname (lastname) from employee where lastname like 's%';

image

The schemas and commands can be found in file: https://github.com/Murarishetti-Shiva-Kumar/Big-Data-Programming/blob/main/ICP%207/Commands.sql