ICP 7 - navyagonug/CS5590-BIG-DATA-PROGRAMMING-USING-HADOOP-AND-SPARK GitHub Wiki

PROBLEM STATEMENT

Write a document and include your CQL screen shots for the commands below. Use the datasets provided to you in class and create a KEYSPACE with the name “test” or use appropriate KEYSPACE according to your choice. You have the flexibility to change data according to your requirements. 1.List the empno,ename,jobtitle,and hiredate of employee from the employee table. 2.List the name,salary of the employees who are clerks. 3.List the name,job,salary of every employee joined on ‘december 17,1980’, 4.List name and annual salary of all the employees. 5.List the department name & deptno for departments having deptno.>=20 6.Display employees’ names, salary and manager values of those employees whose salary is 500 from EMP table using SELECT statement.

DATASETS

The following link is used to download and make use of datasets employees and books_entries.

https://umkc.box.com/s/nvp8gmyxg6ina8txhxr4zc1asmqxfawm

FEATURES

Cassandra is installed on windows(Without using Datastax). Python and JDK are prerequisites that needs to be present on local machine before using cassandra. Python version must be 2.7. Later, Environment variables are added for both Cassandra and python.

APPROACH

Open the command line interface, type in Cassandra, This starts the Cassandra interface. Now, Open a new command line prompt and type in 'cqlsh'. Following queries are then entered as follows. In order to load the given dataset, It is first important to create a keyspace and a table. The following are the screenshots that depicts the creation of keyspace bigdata and table employeen. Once the table is created, Loading of employee.csv file is done by using 'copy' command.

QUERIES

Query1: Display details such as employee name, id, title of job and hire data.

Query2: Display name and salary of all those employees who are "clerks"

Query3: Display name and jobtitles of employees whose hire date is december 17,1980.

Query4: Display name and annual salaries of employees.

Query5: Display department name and number of those whose department number is greater than or equak to 20.

Query6: Display name and salary of those employees whose salary is equal to 500

BONUS

books_entries.csv dataset is used in this case. Following screenshots depicts the creation and loading of data into a new table books1.

The following queries displays the average count of number of pages. The second query lists all the fields from books1 table where the number of pages of books are 600. There is just a single book with 600 pages. The screenshots are as follows.