Restoring Hive MetaStore from one EMR release version to Another - isgaur/AWS-BigData-Solutions GitHub Wiki

We can restore hive metastore from one AWS EMR release version to another in an easy way . For an example

EMR release version v5.24.0 to v5.29.0 . Please follow the below steps in order to do so.


    1. Launch EMR 5.24.1 cluster
    	In Step 1: Software and Steps -
    	Select "Use for Hive table metadata" only under AWS Glue Data Catalog settings. Make sure not to select "Use for Spark table metadata".

    2. Once the EMR 5.24.1 is launched, ssh into the master node 

    3. Take metastore backup and upload it to your s3 bucket
    	$ sudo mysqldump hive > hive.sql
    	$ aws s3 cp ./hive.sql s3://<your_s3_bucket>/

    4. Launch EMR 5.29 cluster
    	In "Step 1: Software and Steps" -
    	Select "Use for Hive table metadata" only under AWS Glue Data Catalog settings. Make sure not to select "Use for Spark table metadata".

    5. Load mysql dump from your S3 bucket to EMR 5.29 cluster
    	$ aws s3 cp s3://<your_s3_bucket>/hive.sql ./
    	$ sudo mysql hive < hive.sql

    6. Stop and Start the hive-hcatalog-server -
    	$ sudo stop hive-hcatalog-server
    	$ sudo start hive-hcatalog-server

    7. You can verify if the ports are listening 

    	$ netstat -anp | grep 9083
    	$ netstat -anp | grep 3306

    8. Make note of your internal IP -
    	$ hostname -f
    	<your_ip>.ec2.internal

    9. Login to metastore and update the DBS table as below - 
    	$ sudo mysql -D hive
    	> update DBS SET DB_LOCATION_URI='hdfs://<your_ip>.ec2.internal:8020/user/hive/warehouse'

Now launch Hive CLI > hive > create database test; > create table test.test1 (id int); > exit;

Launch spark-shell: > spark.sql("use test") > spark.sql("show tables").show()


⚠️ **GitHub.com Fallback** ⚠️