ICP 5 - Murarishetti-Shiva-Kumar/Big-Data-Programming GitHub Wiki

Lesson Plan5: Hadoop and SQL Transfer Tool - Sqoop

To use mysql in cloudera

"sudo service mysqld start" command starts the sql service.

image

To go inside the cloudera mysql "mysql -uroot -pcloudera". Here, user is root, password is cloudera

image

To get list of available databases

image

Creating Database in mysql

image

Using the created database

image

1. Use Sqoop to import and export mySQL Tables to HDFS.

1a. Importing Data from mysql to hadoop

Creating table and inserting data into it

image

image

Viewing the data in the table

image

Now, let's import data from mysql to hadoop using sqoop

sqoop import --connect jdbc:mysql://localhost/icp5db --username root --password cloudra --table acad --m 1;

Here we have set the number of mappers to 1. If we did not set the mapper value by default value of the mappers accessing RDBMS will be 4.

image

image

We shall view the data in hue browser

image

To view in the terminal

image

1b. Exporting data from hadoop to mysql

We shall create a table in mysql to export data into it

image

Exporting data from hadoop to mysql using sqoop command "sqoop export --connect jdbc:mysql://localhost/icp5db --username root --password cloudera --table acad_export --export-dir /user/cloudera/acad/part-m-00000;"

image

image

We shall check data in mysql

select * from acad_export;

image

2. Create Hive Tables through HQL Script, Use Sqoop to import and export tables to Relational Databases

2a. Consider the dataset which has the complex data types such as array, Map, Structs

image

image

LOAD DATA LOCAL INPATH '/home/cloudera/Downloads/employees.txt' INTO TABLE employees;

image

Viewing the loaded data in hive

image

Quering in hive

image

To get information of the table "DESCRIBE EXTENDED employees;"

image

Dropping table in hive

image

Exporting complex data like array can be done by using "varchar" type.

image

Creating table in mysql with required structure

image

exporting complex data from hadoop to mysql

image

2b. Exporting data from hadoop to mysql

Creating the table in hive

image

loading data into hive using the data set file

image

Creating table in mysql

image

Exporting data from hadoop to mysql

sqoop export --connect jdbc:mysql://localhost/icp5db --username root --password cloudera --table empNew --export-dir /user/hive/warehouse/emp -m 1;

image

image

Verifying data in mysql

image

Now, we shall import data from mysql to hadoop

sqoop import --connect jdbc:mysql://localhost/icp5db --username root --password cloudera --table empNew --hive-import --create-hive-table --hive-table deafult.empNew2 -m 1;

image

image

Validating data in mysql and hadoop

image

image

3. Perform three queries from datasets

3a. Create SQL Table and import as Hive

Creating a table in mysql.

image

loading the dataset into table "load data local inpath '/home/cloudera/Downloads/dividends.csv' into table dividends;"

image

validating data in hue browser

image

Querying data in hive

select * from dividends sort by date-cal limit 2;

image image

Now let's create table in mysql

image

Exporting data from hadoop to mysql using sqoop command "sqoop export --connect jdbc:mysql://localhost/icp5db --username root --password cloudera --table dividends --export-dir /user/hive/warehouse/dividends/dividends.csv -m 1;"

image

image

Validating data in mysql after exporting from hadoop to mysql

image

3b. Calculate Statistics, WordCount, Identify a pattern from data.

Calculating the wordcount, here data consists of the date and dividend values. so i have calculated the count of the years available in the data using mysql & hive

select year(date_cal), count(year(date_cal)) as "count" from dividends group by year(date_cal);

image

select year(date_cal), count(year(date_cal)) from dividends group by year(date_cal);

image

Calculating the statistics using compute statistics "analyze table dividends compute statistics;"

image

To identify patterns in the data, i have calculated which dates has the dividend values as '0.55'

select date_cal from dividends where value like '%0.55%';

image

All the commands can be found in: https://github.com/Murarishetti-Shiva-Kumar/Big-Data-Programming/blob/main/ICP%205/Queries.sql