ICP 5 - Murarishetti-Shiva-Kumar/Big-Data-Programming GitHub Wiki
Lesson Plan5: Hadoop and SQL Transfer Tool - Sqoop
To use mysql in cloudera
"sudo service mysqld start" command starts the sql service.
To go inside the cloudera mysql "mysql -uroot -pcloudera". Here, user is root, password is cloudera
To get list of available databases
Creating Database in mysql
Using the created database
1. Use Sqoop to import and export mySQL Tables to HDFS.
1a. Importing Data from mysql to hadoop
Creating table and inserting data into it
Viewing the data in the table
Now, let's import data from mysql to hadoop using sqoop
sqoop import --connect jdbc:mysql://localhost/icp5db --username root --password cloudra --table acad --m 1;
Here we have set the number of mappers to 1. If we did not set the mapper value by default value of the mappers accessing RDBMS will be 4.
We shall view the data in hue browser
To view in the terminal
1b. Exporting data from hadoop to mysql
We shall create a table in mysql to export data into it
Exporting data from hadoop to mysql using sqoop command "sqoop export --connect jdbc:mysql://localhost/icp5db --username root --password cloudera --table acad_export --export-dir /user/cloudera/acad/part-m-00000;"
We shall check data in mysql
select * from acad_export;
2. Create Hive Tables through HQL Script, Use Sqoop to import and export tables to Relational Databases
2a. Consider the dataset which has the complex data types such as array, Map, Structs
LOAD DATA LOCAL INPATH '/home/cloudera/Downloads/employees.txt' INTO TABLE employees;
Viewing the loaded data in hive
Quering in hive
To get information of the table "DESCRIBE EXTENDED employees;"
Dropping table in hive
Exporting complex data like array can be done by using "varchar" type.
Creating table in mysql with required structure
exporting complex data from hadoop to mysql
2b. Exporting data from hadoop to mysql
Creating the table in hive
loading data into hive using the data set file
Creating table in mysql
Exporting data from hadoop to mysql
sqoop export --connect jdbc:mysql://localhost/icp5db --username root --password cloudera --table empNew --export-dir /user/hive/warehouse/emp -m 1;
Verifying data in mysql
Now, we shall import data from mysql to hadoop
sqoop import --connect jdbc:mysql://localhost/icp5db --username root --password cloudera --table empNew --hive-import --create-hive-table --hive-table deafult.empNew2 -m 1;
Validating data in mysql and hadoop
3. Perform three queries from datasets
3a. Create SQL Table and import as Hive
Creating a table in mysql.
loading the dataset into table "load data local inpath '/home/cloudera/Downloads/dividends.csv' into table dividends;"
validating data in hue browser
Querying data in hive
select * from dividends sort by date-cal limit 2;
Now let's create table in mysql
Exporting data from hadoop to mysql using sqoop command "sqoop export --connect jdbc:mysql://localhost/icp5db --username root --password cloudera --table dividends --export-dir /user/hive/warehouse/dividends/dividends.csv -m 1;"
Validating data in mysql after exporting from hadoop to mysql
3b. Calculate Statistics, WordCount, Identify a pattern from data.
Calculating the wordcount, here data consists of the date and dividend values. so i have calculated the count of the years available in the data using mysql & hive
select year(date_cal), count(year(date_cal)) as "count" from dividends group by year(date_cal);
select year(date_cal), count(year(date_cal)) from dividends group by year(date_cal);
Calculating the statistics using compute statistics "analyze table dividends compute statistics;"
To identify patterns in the data, i have calculated which dates has the dividend values as '0.55'
select date_cal from dividends where value like '%0.55%';
All the commands can be found in: https://github.com/Murarishetti-Shiva-Kumar/Big-Data-Programming/blob/main/ICP%205/Queries.sql