ICP Assignment 5 - MadhuriSarode/BDP GitHub Wiki

Student ID : 24 : Madhuri Sarode

Student ID : 4 : Bhargavi

Student ID : 16 : Bhavana

Hadoop and SQL Transfer Tool -Sqoop

Designed for efficiently transferring bulk data between Hadoop and structured data stores such as relational databases.Sqoop act like ETL tool used to copy data between HDFS and SQL databases.


1)Sqoop - Import data from RDBMS to HDFS

mysql : Database creation and data addition

Login to SQL database using the following commands in the cloudera terminal. The existing and default databases can be used using the command show databases; If needed, new database can be created using the command create database database_name; The newly created database can be used for further operations and can be directed to sql using the command use database_name;

A table called person is created and 3 records are pushed into the table.

The import command is executed to transfer the data from RDBMS to HDFS as follows. Using the command, first sql connection is established, and then sqoop import command tells that the data needs to be imported to HDFS from the table specified in the command.

The hadoop file system is checked to confirm the data import using following command

The data is imported to HDFS file system to a specific target directory using the following command

The HDFS file system can be checked to confirm the data transfer

The export tool exports a set of files from HDFS back to an RDBMS. The files given as input to Sqoop contain records, which are called rows in table. The data in the HDFS file system is exported to the matching table in sql database. As seen below, the person table is truncated first and then export command is run to confirm the data export.


2)Hive - Data import and export between hive and SQL using sqoop

The data tables are created in hive using HQL. Existing hive table called user is used. The data from user table is moved to sql using sqoop export command. The path of the hive table is also specified in the command An empty table is first created in the sql according to the table description of the hive table. And once sqoop export command runs, sql database is queried, we can see that user table is having data.


3)Create Hive Table and export to MySQL

Empty Stocks table is created in sql database

Using HQL, hive table is created and data is loaded into it.

Sqoop export command exports data into hive table from sql

The stocks table is filled with data from hive table after the export


4) Statistics

5) Word count

6)Identifying pattern