ICP 7 - Murarishetti-Shiva-Kumar/Big-Data-Programming GitHub Wiki
Lesson Plan7: Independent Column based No SQL Tool - Cassandra
Steps for installation of cassandra
1.Download and install Python 2.7.X(https://www.python.org/downloads/release/python-2717/)
2.Download Cassandra Version 3.11.10 or latest (http://cassandra.apache.org/download/)
3.Extract it in your local system
4.Add environment variable CASSNADRA_HOME with value "C:\apache-cassandra-3.11.10"
5.Navigate to "C:\apache-cassandra-3.11.10\bin" folder
6.Open command terminal window as administrator, navigate to "C:\apache-cassandra-3.11.10\bin" and Type "cassandra.bat -f" for starting Cassandra server on localhost
7.Wait for server to initialize completely
8.Open new terminal windows navigate to "C:\apache-cassandra-3.11.10\bin" and type "cqlsh"
9.Open Cassandra Shell
Creation of Keyspace
1.Create the keyspace in Cassandra using the following command.
CREATE KEYSPACE icp7 WITH REPLICATION={'class':'SimpleStrategy','replication_factor':3};
Here SimpleStrategy is replication strategy and replication_factor is 3. We are using the simple strategy as network strategy because all the types being used here are simple types. The replication factor is 3 which indicates the copies of data.
Replication factor plays a key role in fault tolerance. During write operation the Coordinator sends a write request to the nodes and the default node will be the node selected in the partition. Incase if there is breakdown in any of the nodes or the default node the data will be acknowledged by the any one of the other two replicas since the replication factor is 3.
2.Now check whether the keyspace is created or not by listing all the keyspaces using the following command.
Desc keyspaces;
3.After creating Keyspace you have to use the key space.
use icp7;
1. List the empID,ename,jobtitle,and hiredate of employee from the employee table.
Create the table “employee”
create table employee(employee_id int primary key,department text,lastname text,years_with_company int,hiredate date,jobtitle text,salary int,managerid int);
Checking the table structure using
desc table employee;
Loading data into the employee table
Copy employee(employee_id,department,lastname,years_with_company,hiredate,jobtitle,salary,managerid) from 'C:\Users\mshiv\Downloads\employee_entries.csv' with HEADER=true;
Now, we shall display the details of employee from the table
select * from employee;
select employee_id,lastname,hiredate,jobtitle from employee;
2. List the name,salary of the employees who are clerks.
Since the role clerk is not present in the table i am inserting the record manually.
Insert into icp7.employee(employee_id, department,lastname,years_with_company,hiredate,jobtitle,salary,managerid) values (9, 'Engineering', 'Paul',3,'2018-08-23','clerk',50000,3);
select lastname,salary from employee where jobtitle='clerk' allow filtering;
3. List the name,job,salary of every employee joined on ‘february 18,2000’.
select lastname,jobtitle,salary from employee where hiredate='2000-02-18' allow filtering;
4. List name and annual salary of all the employees.
select lastname,salary from employee;
5. Display employees’ names, salary and manager values of those employees whose salary is 45000 from EMP table using SELECT statement.
select lastname,salary,managerid from employee where salary=45000 allow filtering;
Bonus Question
1. Write a query that displays last names with first letter capitalized and all other letters in lowercase and length of lastnames whose lastnames start with J, G or S.
Step 1: Enable user defined functions=true;(for java) enable_scripted_user_defined_functions:true;(for javascript) in location\apache-cassandra-3.11.10\conf\cassandra.yaml file. Initially this would be false by default, this option allows the user to write their own functions in cqlsh.
Step 2: After doing these changes use the “run cassandra.bat -f” command to restart the server again.(In this case close the previous running server.)
Step 3: Since lastname is not the primary key of the table and since the table is already created create a custom index inorder to access the table details with the lastname
Step 4: The custom index can be generated using the following command.
Here the pattern given is with the people whose name starts with a pattern. So the mode is “prefix”. If the pattern is at the end it should be “contains”. CREATE CUSTOM INDEX firstname_idx ON employee (lastname) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 'false'};
Describing the table
Step 5: Incase if the index is created by mistake we can drop it.
Drop index IF EXISTS icp7.firstname_idx;
Step 6: Now we shall create a custom function displayname using the javascript function using the following command.
CREATE FUNCTION IF NOT EXISTS displayname (column TEXT) RETURNS NULL ON NULL INPUT RETURNS text LANGUAGE javascript AS $$ column.charAt(0).toUpperCase() + column.slice(1) $$;
Display the data using the created function
select displayname (lastname) from employee ;
Now we shall display the data of the people whose name starts with “s”.
select displayname (lastname) from employee where lastname like 's%';
The schemas and commands can be found in file: https://github.com/Murarishetti-Shiva-Kumar/Big-Data-Programming/blob/main/ICP%207/Commands.sql