ICP 5 - Murarishetti-Shiva-Kumar/Big-Data-Programming GitHub Wiki

Lesson Plan5: Hadoop and SQL Transfer Tool - Sqoop

To use mysql in cloudera

"sudo service mysqld start" command starts the sql service.

To go inside the cloudera mysql "mysql -uroot -pcloudera". Here, user is root, password is cloudera

To get list of available databases

Creating Database in mysql

Using the created database

1. Use Sqoop to import and export mySQL Tables to HDFS.

1a. Importing Data from mysql to hadoop

Creating table and inserting data into it

Viewing the data in the table

Now, let's import data from mysql to hadoop using sqoop

sqoop import --connect jdbc:mysql://localhost/icp5db --username root --password cloudra --table acad --m 1;

Here we have set the number of mappers to 1. If we did not set the mapper value by default value of the mappers accessing RDBMS will be 4.

We shall view the data in hue browser

To view in the terminal

1b. Exporting data from hadoop to mysql

We shall create a table in mysql to export data into it

Exporting data from hadoop to mysql using sqoop command "sqoop export --connect jdbc:mysql://localhost/icp5db --username root --password cloudera --table acad_export --export-dir /user/cloudera/acad/part-m-00000;"

We shall check data in mysql

select * from acad_export;

2. Create Hive Tables through HQL Script, Use Sqoop to import and export tables to Relational Databases

2a. Consider the dataset which has the complex data types such as array, Map, Structs

LOAD DATA LOCAL INPATH '/home/cloudera/Downloads/employees.txt' INTO TABLE employees;

Viewing the loaded data in hive

Quering in hive

To get information of the table "DESCRIBE EXTENDED employees;"

Dropping table in hive

Exporting complex data like array can be done by using "varchar" type.

Creating table in mysql with required structure

exporting complex data from hadoop to mysql

2b. Exporting data from hadoop to mysql

Creating the table in hive

loading data into hive using the data set file

Creating table in mysql

Exporting data from hadoop to mysql

sqoop export --connect jdbc:mysql://localhost/icp5db --username root --password cloudera --table empNew --export-dir /user/hive/warehouse/emp -m 1;

Verifying data in mysql

Now, we shall import data from mysql to hadoop

sqoop import --connect jdbc:mysql://localhost/icp5db --username root --password cloudera --table empNew --hive-import --create-hive-table --hive-table deafult.empNew2 -m 1;

Validating data in mysql and hadoop

3. Perform three queries from datasets

3a. Create SQL Table and import as Hive

Creating a table in mysql.

loading the dataset into table "load data local inpath '/home/cloudera/Downloads/dividends.csv' into table dividends;"

validating data in hue browser

Querying data in hive

select * from dividends sort by date-cal limit 2;

Now let's create table in mysql

Exporting data from hadoop to mysql using sqoop command "sqoop export --connect jdbc:mysql://localhost/icp5db --username root --password cloudera --table dividends --export-dir /user/hive/warehouse/dividends/dividends.csv -m 1;"

Validating data in mysql after exporting from hadoop to mysql

3b. Calculate Statistics, WordCount, Identify a pattern from data.

Calculating the wordcount, here data consists of the date and dividend values. so i have calculated the count of the years available in the data using mysql & hive

select year(date_cal), count(year(date_cal)) as "count" from dividends group by year(date_cal);

select year(date_cal), count(year(date_cal)) from dividends group by year(date_cal);

Calculating the statistics using compute statistics "analyze table dividends compute statistics;"

To identify patterns in the data, i have calculated which dates has the dividend values as '0.55'

select date_cal from dividends where value like '%0.55%';

All the commands can be found in: https://github.com/Murarishetti-Shiva-Kumar/Big-Data-Programming/blob/main/ICP%205/Queries.sql