Hive Blob Optimization Known Issue CTAS query is failing to move final output data from HDFS to S3 bucket. - isgaur/AWS-BigData-Solutions GitHub Wiki

  1. set hive.blobstore.optimizations.enabled=false;

  2. create database destination location 's3://athinaiad/hivestore/destination';

  3. CREATE EXTERNAL TABLE IF NOT EXISTS testEmpty (name STRING, gender STRING, age INT, company STRING, city STRING )

    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 's3://athenaiad/datasources/searchablefiles2/';

  4. CREATE TABLE IF NOT EXISTS destination.whoBasic2 AS SELECT name, gender, age FROM testEmpty where 1!=1;

With the above, check whether the SELECT query in CTAS is returning any data.

Workarounds :

  • Hive is not renaming a temporary directory correctly when hive.blobstore.optimizations.enabled=false is set and a query returns 0 records. Hence, the job is not able to find directory to upload and fails.

-- If you are absolutely sure that your query is going to return 0 records all the time then you can set hive.blobstore.use.blobstore.as.scratchdir=true and your table can be created with the same query.

-- But if you are not sure if your query returns 0 or more records, I suggest you to check if your "select col1,col2 from " returns 0 record and set hive.blobstore.use.blobstore.as.scratchdir=true before running CREATE TABLE AS SELECT query and then set it back hive.blobstore.use.blobstore.as.scratchdir=false.

⚠️ **GitHub.com Fallback** ⚠️