ICP 5 - navyagonug/CS5590-BIG-DATA-PROGRAMMING-USING-HADOOP-AND-SPARK GitHub Wiki

PROBLEM STATEMENT

1.Use Sqoop to import and export mySQL Tables to HDFS. 2.Create Hive Tables through HQL Script , Use Sqoop to import and export tables to Relational Databases 3.Perform three queries from databases

FEATURES

In this in-class programming, Sqoop is used for transferring the data(tables) from relational database(In this case,MySQL) to Hadoop as well as Hive. Also, A dataset is considered and queries for wordcount, identifying patterns using regular expressions and finding out the statistics of that data. MYSQL,Sqoop,Hive are used.

CONFIGURATIONS

Sqoop is downloaded from the official site and installed. Similarly mysql connector is also downloaded,installed and is moved to the appropriate location.

Question

Use Sqoop to import and export mySQL Tables to HDFS.

APPROACH

Initially, A table is created in mySQL and values are inserted into it. Later, using the import command, data is transferred from mySQL to HDFS. On the other hand, From HDFS to mySQL, A table must be created manually in mySQL with matching parameters and by using export command, data is transferred. Below are the series of screenshots as well the size of the data and time taken to process.

QUESTION

Create Hive Tables through HQL Script , Use Sqoop to import and export tables to Relational Databases

APPROACH

A similar approach is followed where the data is transferred using import and export between mySQL and Hive.

PROBLEM

Perform three queries from databases

APPROACH

The following are the screenshots which includes queries as well as outputs. Initially, Data is loaded into mySQL and then imported into Hive.