BigQuery to S3 Data Migration Documentation - SimPPL/arbiter-documentation GitHub Wiki

Introduction

This guide provides step-by-step instructions on how to migrate data from BigQuery to S3. It covers setting up the necessary credentials, storing them securely in AWS Secrets Manager, creating an AWS Glue connection, and using AWS Glue to create a transfer job. The guide also includes details on customizing the transfer script and provides a link to a video resource for further assistance.

⚠️ Steps 3 to 29 are a one-time setup process.

Table of Contents

Step-by-Step Guide

Navigation to Project on GCP

Step 1: Navigate to Google Cloud Console & login

Step 2: Switch to 'JournalismAI Parrot' project

BigQuery to S3 Data Migration Documentation - Step 2

Service Account Creation on GCP

⚠️ Service Account on GCP & Secret on AWS have already been created for Arbiter Data Migration. Can skip to Step 30.

Step 3: Click "IAM & Admin"

BigQuery to S3 Data Migration - Step 3

Step 4: Create a service account with BigQuery access & store the JSON credentials file

BigQuery to S3 Data Migration - Step 4

Step 5: Copy the contents of this file
This is an example of the JSON credentials file.

BigQuery to S3 Data Migration - Step 5

Step 6: Navigate to https://toolbox.googleapps.com/apps/encode_decode & paste the contents of the file. Convert it to Base64 format.

BigQuery to S3 Data Migration - Step 6

Step 7: Save the Base64 output for further use This is an example of such an output.

BigQuery to S3 Data Migration - Step 7

Adding Secret to AWS

⚠️ We have already created a "Big_Query_Service_Account_Key" for data migration.

Step 8: Navigate to AWS Console & login

Step 9: Search for and open "Secrets Manager"

BigQuery to S3 Data Migration - Step 10

Step 10: Click "Store a new secret"

BigQuery to S3 Data Migration - Step 10

Step 11: Click "Other type of secret"

BigQuery to S3 Data Migration - Step 11

Step 12: Click on the text field under 'Key/value pairs' under 'key/value' tab & type "credentials"

BigQuery to S3 Data Migration - Step 12

Step 13: Paste the base64 output from Step 7

BigQuery to S3 Data Migration - Step 13

Step 14: Click "Next"

BigQuery to S3 Data Migration - Step 14

Step 15: Click the "Secret name" field & name the secret appropriately

BigQuery to S3 Data Migration - Step 15

Step 16: Click "Next"

BigQuery to S3 Data Migration - Step 16

Step 17: Click "Next" again on the next page

BigQuery to S3 Data Migration - Step 17

Step 18: Click "Store"

BigQuery to S3 Data Migration - Step 18

Connection Creation on AWS Glue

Step 19: On AWS Console, search for & open "AWS Glue"

BigQuery to S3 Data Migration - Step 19

Step 20: Click "Data connections"

BigQuery to S3 Data Migration - Step 20

Step 21: Click "Create connection"

BigQuery to S3 Data Migration - Step 21

Step 22: Find "Google BigQuery" in the Data Sources

BigQuery to S3 Data Migration - Step 22

Step 23: Click "Next"

BigQuery to S3 Data Migration - Step 23

Step 24: Click "Choose one secret" to choose an AWS Secret

BigQuery to S3 Data Migration - Step 24

Step 25: Choose the secret previously created by you
Here, we use the "Big_Query_Service_Account_Key" secret created for data migration.

BigQuery to S3 Data Migration - Step 25

Step 26: Click "Next"

BigQuery to S3 Data Migration - Step 26

Step 27: Add a Name and Description for your new connection

BigQuery to S3 Data Migration - Step 27

Step 28: Click "Next"

BigQuery to S3 Data Migration - Step 28

Step 29: Click "Create connection"

BigQuery to S3 Data Migration - Step 29

Data Migration using AWS Glue

Step 30: Search for & click "AWS Glue"

BigQuery to S3 Data Migration - Step 30

Step 31: Go to "ETL jobs"

BigQuery to S3 Data Migration - Step 31

⚠️ You can create a custom transfer job using "Visual ETL" or "Script editor"

Step 32: Under 'Your Jobs', click on "truthsocial_data-transfer"
This is a script already created for migration. Few parameters need to be changed in the created script based on the dataset or table to be transferred. Following steps will demonstrate this in detail.

BigQuery to S3 Data Migration - Step 32

Step 33: Edit 'connectionName' for the GoogleBigQuery node, if you created a new connection Else, let it be 'Big Query Connection' as shown.

BigQuery to S3 Data Migration - Step 33

Step 34: Navigate to the Google Cloud Console

BigQuery to S3 Data Migration - Step 34

Step 35: Search and open "BigQuery"

BigQuery to S3 Data Migration - Step 35

Step 36: Click on the project name shown at the top

BigQuery to S3 Data Migration - Step 36

Step 37: Copy the parent project ID for 'JournalismAI Parrot' project
The parent project ID is "infinite-rope-363317" in our case.

BigQuery to S3 Data Migration - Step 37

Step 38: Paste the copied ID in the "parentProject" field in the script on AWS

BigQuery to S3 Data Migration - Step 38

Step 39: Go back to GCP & copy the name of the dataset to be transferred
Here "bluesky_social" is the dataset name.

BigQuery to S3 Data Migration - Step 39

Step 40: Paste the dataset's name under 'connection_options' for the BiqQuery Node in the AWS Glue Script

BigQuery to S3 Data Migration - Step 40

Step 41: Paste the dataset's name under 'connection_options' for the S3 Node in the AWS Glue Script
The folder structure depicted is that of the S3 bucket, in the manner which is currently being followed. The path can be customized as required.

BigQuery to S3 Data Migration - Step 41

Step 42: On GCP, copy the table names under the dataset to be transferred
Here "blocks", "follows", etc. are the table names.

BigQuery to S3 Data Migration - Step 42

Step 43: Paste copied table names into the "table_names" array

BigQuery to S3 Data Migration - Step 43

Step 44: Click "Save"

BigQuery to S3 Data Migration - Step 44

Step 45: Click "Run"

BigQuery to S3 Data Migration - Step 45

Step 46: Click "Run details" to view the logs of the running job

BigQuery to S3 Data Migration - Step 46

Resources

PDF of Scribe Documentation of the above steps: BigQuerytoS3DataMigrationDocumentation.pdf