ICP 5 - PallaviArikatla/Big-Data-Programming GitHub Wiki

OBJECTIVE: Working on sqoop import and export between HDFS and RDBMS.

QUESTION 1: Use Sqoop to import and export mySQL Tables to HDFS.

Steps:

  1. Start MySQL with the help of following commands

sudo service mysqld start

mysql -u root - pcloudera

  1. Create a databases db1 and get into that databases as follows:

mysql>create database db1;

mysql>use db1;

  1. Create a table in SQL and insert the desired values

mysql>create table weights(st_id INT, st_name varchar(10), st_weights INT);

mysql>insert into weights values(1, "mike",70), (2,"nancy",50), (3,"max",45);

check whether the table is created or not using the following command:

mysql>select * from weights;

The table gets created as follows:

  1. exit from mysql using below command:

exit;

  1. Import the table to HDFS using following command:

sqoop import --connect jdbc:mysql://localhost/db1 --username root --password cloudera --table weights;

Visualization in Hue:

  1. Check for the table:

  1. Create another table in SQL before exporting with the help of following commands:

$sudo service mysqlId start

$mysql -u root - pcloudera

mysql>create table student(st_id INT, st_name varchar(10), st_weights INT);

mysql>exit;

  1. Export the table using sqoop from HDFS to MySQL

sqoop export --connect jdbc:mysql://localhost/db1 --username root --password cloudera --table student_weights --export-dir /user/cloudera/weights/part-m-00000

We'll get the same table is SQL as shown in the Hue browser.

QUESTION 2: Create Hive Tables through HQL Script, Use Sqoop to import and export tables to Relational Databases.

  1. Initialize by getting into Hive in the terminal simply by typing 'Hive'.

  2. Create a table in Hive with desired table elements with following commands:

    hive>create table users(st_id INT, st_name varchar(10), st_weights INT)

    row format delimited fields terminated by ',' stored as textfile; hive>load data local inpath '/home/cloudera/Downloads/users' into table users;

  3. Create another table or use any existing table.

Here I've created another table and loaded it to already existing user table with the above mentioned command. This is how importing part is done.

  1. Create a table with name usersNew in MySQL

$sudo service mysqld start

$mysql -u root - pcloudera

mysql>create table usersNew(st_id INT, st_name varchar(10), st_weights INT);

mysql>exit;

  1. For exporting users data from hdfs to rdbms using below command

$sqoop export --connect jdbc:mysql://localhost/db2 --username root --password cloudera --table usersNew --export-dir /user/hive/warehouse/hivedb.db/users;

QUESTION 3: Perform three queries from databases.

1.Statistics.

2.WordCount.

3.Identifying pattern.

  1. WordCount:

a. Initiate by getting into Hive in the terminal. $hive

hive>CREATE TABLE word_count_movies_updated AS

SELECT word, count(1) AS count FROM (SELECT explode(split(concat_ws(',',cast(id as string),title,genre),',')) AS word FROM movies) w GROUP BY word ORDER BY word;

Output will be visualized as follows:

  1. To get the statistics of a hive table, run the below command in hive

hive>analyze table movies compute statistics;

  1. For identifying pattern run the below query in hive.

hive>select title from movies where title like '%1995%';