ICP Assignment 5 - MadhuriSarode/BDP GitHub Wiki
Student ID : 24 : Madhuri Sarode
Student ID : 4 : Bhargavi
Student ID : 16 : Bhavana
Hadoop and SQL Transfer Tool -Sqoop
Designed for efficiently transferring bulk data between Hadoop and structured data stores such as relational databases.Sqoop act like ETL tool used to copy data between HDFS and SQL databases.
1)Sqoop - Import data from RDBMS to HDFS
mysql : Database creation and data addition
Login to SQL database using the following commands in the cloudera terminal.
The existing and default databases can be used using the command show databases;
If needed, new database can be created using the command create database database_name;
The newly created database can be used for further operations and can be directed to sql using the command
use database_name;
A table called person is created and 3 records are pushed into the table.
The import command is executed to transfer the data from RDBMS to HDFS as follows. Using the command, first sql connection is established, and then sqoop import command tells that the data needs to be imported to HDFS from the table specified in the command.
The hadoop file system is checked to confirm the data import using following command
The data is imported to HDFS file system to a specific target directory using the following command
The HDFS file system can be checked to confirm the data transfer
The export tool exports a set of files from HDFS back to an RDBMS. The files given as input to Sqoop contain records, which are called rows in table.
The data in the HDFS file system is exported to the matching table in sql database.
As seen below, the person table is truncated
first and then export command is run to confirm the data export.
2)Hive - Data import and export between hive and SQL using sqoop
The data tables are created in hive using HQL. Existing hive table called user is used. The data from user table is moved to sql using sqoop export command. The path of the hive table is also specified in the command
An empty table is first created in the sql according to the table description of the hive table. And once sqoop export command runs, sql database is queried, we can see that user table is having data.
3)Create Hive Table and export to MySQL
Empty Stocks table is created in sql database
Using HQL, hive table is created and data is loaded into it.
Sqoop export command exports data into hive table from sql
The stocks table is filled with data from hive table after the export
4) Statistics
5) Word count
6)Identifying pattern