Hive Blob Optimization Known Issue CTAS query is failing to move final output data from HDFS to S3 bucket. - isgaur/AWS-BigData-Solutions GitHub Wiki
-
set hive.blobstore.optimizations.enabled=false;
-
create database destination location 's3://athinaiad/hivestore/destination';
-
CREATE EXTERNAL TABLE IF NOT EXISTS testEmpty (name STRING, gender STRING, age INT, company STRING, city STRING )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 's3://athenaiad/datasources/searchablefiles2/';
-
CREATE TABLE IF NOT EXISTS destination.whoBasic2 AS SELECT name, gender, age FROM testEmpty where 1!=1;
With the above, check whether the SELECT query in CTAS is returning any data.
- Hive is not renaming a temporary directory correctly when hive.blobstore.optimizations.enabled=false is set and a query returns 0 records. Hence, the job is not able to find directory to upload and fails.
-- If you are absolutely sure that your query is going to return 0 records all the time then you can set hive.blobstore.use.blobstore.as.scratchdir=true and your table can be created with the same query.
-- But if you are not sure if your query returns 0 or more records, I suggest you to check if your "select col1,col2 from " returns 0 record and set hive.blobstore.use.blobstore.as.scratchdir=true before running CREATE TABLE AS SELECT query and then set it back hive.blobstore.use.blobstore.as.scratchdir=false.