ICP 7 - manaswinivedula/Big-Data-Programming GitHub Wiki

Apache Cassandra

Installation steps

  1. Installed Java version 1.8, python version 2.7.14, and Cassandra version`3.11.6

  2. After installation, I have set up the path of environmental variables for JAVA_HOME and Python.

  3. extract the files from the Cassandra zip folder and paste it in the programming files.

configuring Cassandra

  1. Opening the command prompt in administrator mode and changing the directory to the Cassandra bin. Then initializing the Cassandra server on the local system. This initialization will take some time

cd C:\Program Files\apache-cassandra-3.11.6-bin\apache-cassandra-3.11.6\bin cassandra.bat -f

  1. After Successful initialization open a new command prompt and again direct to the Cassandra bin directory and enter in to the Cassandra shell using the below commands.

cd C:\Program Files\apache-cassandra-3.11.6-bin\apache-cassandra-3.11.6\bin

cqlsh

creating a keyspace new with replication_factor 1 and checking whether the keyspace successfully created or not. Then created a table named employee in Cassandra server and then imported the local file into the Cassandra server using the following commands

create keyspace new with replication={'class':'SimpleStrategy', 'replication_factor':1};

desc keyspaces;

use new

create table employee (employee_id int PRIMARY KEY, department text, lastname text, years_with_company int, hiredate text, jobtitle text, salary int, managerid int);

copy employee (employee_id, department, lastname, years_with_company, hiredate, jobtitle, salary, managerid) from 'C:\Users\manas\Documents\employee_entries.csv' WITH DELIMITER='|' AND HEADER = TRUE;

Task 1

1.To select lastname, department,employee_id and hire date from the table employe.

select employee_id, lastname, department, hiredate from employee;

2. To find the names and salary of the clerks.

select lastname, salary from employee where jobtitle ='clerks' allow filtering;

3. To find the name, job title, and salaries of the persons who were hired on 18th February 2000

select lastname, jobtitle,salary from employee where hiredate='2000-02-18' allow filtering;

4. To select the names and salaries of all employee.

select lastname, salary from employee;

5. To find the name,salary and managerid of all employees whose salary equals to 45000.

select lastname,salary, managerid from employee where salary =45000 allow filtering;

Bonus

1. changed the Cassandra.YAML file settings enable_user_defined_functions: true enable_scripted user_defined_functions: true. created a user-defined scripted function named display name for changing the first letter of the name to be capitalized.

CREATE FUNCTION IF NOT EXISTS displayname (column TEXT) RETURNS NULL ON NULL INPUT RETURNS text LANGUAGE javascript AS $$ column.charAt(0).toUpperCase() + column.slice(1) $$;

captilizing the first letter of all the names from employee.

select displayname(lastname) from employee;

created a user-defined function len to find the length of the last name.

CREATE FUNCTION IF NOT EXISTS len (input text) CALLED ON NULL INPUT RETURNS int LANGUAGE java AS ' return input.length();';

created a custom index on the last name for the last name field of the employee table.

CREATE CUSTOM INDEX firstname_idx ON employee (lastname) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'PREFIX', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 'false'};

To find the name and length of the employees whose name starts with 'G'

select lastname, len(lastname) as length from employee where lastname like 'G%';

To find the name and length of the employees whose name starts with 'j'

select displayname(lastname), len(lastname) as length from employee where lastname like 'j%';

To find the name and length of the employees whose name starts with 's'

select displayname(lastname), len(lastname) as length from employee where lastname like 's%';

2. creating a new table course in Cassandra server and loading the courses_entries file from the local system to the Cassandra.

create table course (name text, location text, instructor text PRIMARY KEY);

copy course (name, location, instructor) from 'C:\Users\manas\Downloads\courses_entries.csv' WITH DELIMITER=',' AND HEADER = TRUE;

To select all records from course table.

To select instructors and course where the course is "Biology"

select * from course;

select name,instructor from course where name ='Biology' allow filtering;

References

1.https://umkc.app.box.com/s/6xc5q85r7aht8fx8wadb03se4l8d79o4

2.https://issues.apache.org/jira/browse/CASSANDRA-8222

3.https://umkc.app.box.com/s/xw305tyjz44wdo237l1gzr2fzvso2wwm