How to move data from gcp cloud storage to aws s3 - isgaur/AWS-BigData-Solutions GitHub Wiki
Here are the instructions that provides step by step information to move the data from GCP to AWS s3.These steps can be performed either in GCP or AWS environment using Virtual Machine ( For example : Google Dataproc or AWS EC2/EMR ) . I will use AWS EC2 to demonsrtate these steps:
-
Login to AWS EC2 machine .
-
Create a directory with a name of your choice . Example : Data_Transfer .Execute the below command:
mkdir Data_Transfer && cd Data_Transfer
-
The recommended way of installing gsutil is as part of the Google Cloud SDK.Download Google Cloud SDK by executing the below command:
-
Untar it using :
tar -xf google-cloud-sdk-280.0.0-linux-x86_64.tar.gz
-
Go inside the extracted folder using :
cd Data_Transfer/google-cloud-sdk
-
Excecute the below commands :
source path.bash.inc
source completion.bash.inc
-
Verify installation of gsutil using:
gsutil -v
-
Get GCP json credential from your Google GCP account and perform the following:
export GOOGLE_APPLICATION_CREDENTIALS="/Path/to/GCP/Credential-File/MyFirst Project-5f16867d2542.json"
-
Peform GCP cloud authentication to authorize gcloud to access the Cloud Platform with Google user credentials using the below and complete the authetication process:
gcloud auth login
-
Create a .boto file on EC2 at /home/ec2 and add the following:
[Credentials]
aws_access_key_id = your-aws-access-key-id
aws_secret_access_key = your-aws-secret-access-key-id
[s3]
host = s3.us-west-2.amazonaws.com #Modify the region per your requirement
use-sigv4 = True
11 . Once done you can try listing the commands to ensure you are able to communicate with the Google Cloud storage bucket using:
gsutil ls gs://demogsutil #Listing the GCP bucket from where you want to migrate the data.
-
Once listing is verified use the following command to copy the data from GCP cloud storage bucket to AWS s3.
gsutil -m cp -r gs://gcp-bucket s3://s3-bucket # This command copies the data recursively from GCP Cloud Storage bucket named gcp-bucket to AWS s3 bucket named s3-bucket
-
For future operations,If necessary use the following command to sync the data between GCP cloud storage and AWS s3:
gsutil -m rsync -r gs://gcp-bucket s3://s3-bucket #gsutil rsync command makes the contents under dst_url the same as the contents under src_url, by copying any missing files/objects
Options
-m Causes supported operations (acl ch, acl set, cp, mv, rm, rsync, and setmeta) to run in parallel. This can significantly improve performance if you are performing operations on a large number of files over a reasonably fast network connection.
-r To recurse into directories.
Reference : https://cloud.google.com/storage/docs/gsutil