ICP5 - gracesyl/big-data-hadoop GitHub Wiki
ICP5:
Sqoop introduction:
Hadoop is great for storing massive data in terms of volume using HDFS•It provides scalable processing environment for structured and unstructured data•But’s it is batch-oriented processing thus not suitable for interactive query applications•Sqoop act like ETL tool used to copy data between HDFS and SQL databases.
part1:
1.Create table in MySQL and Import into HDFS through Sqoop. 2.Export table from HDFS to MySQL
creation of table in mysql:
Importing by using the sqoop:
Part2: Create HIVE tables using Hbase;
Creating target table in mysql:
exporting in hadoop:
Table in mysql:
Part3:
Choose one of following datasets:
I have chosen stock dataset and i have downloaded.
Create table in hive and load the dataset:
creating the table in mysql to import from hive:
Importing from hive to sql using sqoop:
Form 3 intuitive questions from your dataset: 1.Statistics 2.WordCount 3.Identifying pattern
1.statistical query in hbase using the stock table:
- Wordcount query in hbase:
WordCount Output:
3.Identifying pattern:
I used the pattern "LIKE" in hbase to find %24% inbetween occurring values in the stock dataset.
This is how the result of pattern is shown in the hive.
BONUS:
1.Save your queries resultsinto Hive Table 2.Use complex datatypes for your queries
I have saved the pattern result in a separate table in hive using a single query as follows:
Thus showing the created output pattern in a separate table as follows:
2.use complex datatypes for your queries:
I have created complex datatype table in hive as follows:
using the complex dataset in query as follows:
grouping the complex datatype column in my query and doing the group by in hive:
Thus the import and export queries are run using sqoop command in hadoop.