Lab 004 - jazzwang/hive_labs GitHub Wiki
- 首先,請刪除 Sqoop (2) 產生的 HDFS 目錄 mysql_data
user@master ~ $ hadoop fs -rmr mysql_data
否則,等一下執行 sqoop create-hive-table 或 sqoop import 時,會出現以下的訊息。
13/12/22 14:14:55 INFO hive.HiveImport: Hive import complete. 13/12/22 14:14:55 INFO hive.HiveImport: Export directory is not empty, keeping it.
- 請輸入如下指令並將「帳號」處替換成您的帳號:
user@master ~ $ export DBID=帳號 user@master ~ $ sqoop create-hive-table --connect "jdbc:mysql://sql.3du.me/$DBID" --table MSSQL_DATA --username $DBID -P Enter password: 輸入密碼
- 若資料表 Schema 轉換有正常執行,您將看到以下的類似訊息:
13/12/22 13:44:43 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 13/12/22 13:44:43 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 13/12/22 13:44:43 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 13/12/22 13:44:44 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `mysql_data` AS t LIMIT 1 13/12/22 13:44:44 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `mysql_data` AS t LIMIT 1 .. 略 ..
- 檢查 Hive 的 HDFS 目錄是否有產生資料表
user@master ~ $ hadoop fs -ls /hive/warehouse Found 3 items drwx------ - jazz supergroup 0 2013-12-17 00:24 /hive/warehouse/dummy drwx------ - user supergroup 0 2013-12-22 13:20 /hive/warehouse/mssql_data drwx------ - user supergroup 0 2013-12-22 12:25 /hive/warehouse/mysql_data
- 檢查 Hive mysql_data 資料表是否有內容
user@master ~ $ hadoop fs -ls /hive/warehouse/mysql_data
- 請輸入如下指令(與前面不同的是加上
--hive-import --hive-table mysql_data
參數)
user@master ~ $ export DBID=帳號 user@master ~ $ sqoop import --connect "jdbc:mysql://sql.3du.me/$DBID" --table mysql_data --username $DBID -P --hive-import --hive-table mysql_data Enter password: 輸入密碼
- 若資料表匯入有正常執行,您將看到以下的類似訊息:
13/12/22 13:51:58 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 13/12/22 13:51:58 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 13/12/22 13:51:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 13/12/22 13:51:58 INFO tool.CodeGenTool: Beginning code generation 13/12/22 13:51:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `mysql_data` AS t LIMIT 1 13/12/22 13:51:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `mysql_data` AS t LIMIT 1 ... 略 ... 13/12/22 13:54:59 INFO hive.HiveImport: OK 13/12/22 13:54:59 INFO hive.HiveImport: Time taken: 5.73 seconds 13/12/22 13:55:00 INFO hive.HiveImport: Loading data to table default.mysql_data 13/12/22 13:55:00 INFO hive.HiveImport: OK 13/12/22 13:55:00 INFO hive.HiveImport: Time taken: 1.693 seconds 13/12/22 13:55:00 INFO hive.HiveImport: Hive import complete.
- 轉換之結果,可於 Hive 儲存於 HDFS 的路徑查到。
user@master ~ $ hadoop fs -ls /hive/warehouse/mysql_data Found 2 items -rw------- 3 user user 0 2013-12-22 13:54 /hive/warehouse/mysql_data/_SUCCESS -rw------- 3 user user 16 2013-12-22 13:54 /hive/warehouse/mysql_data/part-m-00000
- 檢視 part-m-***** 轉換結果
user@master ~ $ hadoop fs -cat /hive/warehouse/mysql_data/part-m-00000 No encryption was performed by peer. 13/12/22 14:00:36 INFO util.NativeCodeLoader: Loaded the native-hadoop library 1Hello 2World
- "Sqoop: Big data conduit between NoSQL and RDBMS", Surajit Paul, Advisory Consultant, IBM, 23 Jul 2013