Case: Teradata migrate to BigQuery - davidkhala/ETL GitHub Wiki

pre-migration considerations

A very thorough understanding of the current Teradata environment

Business Process Logic
Teradata Data Model(s)
BTEQ, Stored procedures, and other code
Pipeline dependencies and scheduling requirements
Monitoring, auditing, and logging requirements
Security requirements for the entire stack
Reporting requirements including KPIs and SLAs
CICD and development processes
In BigQuery there is no direct equivalent for Teradata Set Table. We recommend data from all tables be deduplicated on fetch

It’s critical to review and plan how data will be transformed as it is migrated into the BigQuery environment.

Data type mapping is challenging, there are some data type differences between Teradata and BigQuery
Generally, the most defensive data type is used by default to be loaded on to BigQuery.
In case a need to preserve to a compatible type, do some additional massaging to the data in BigQuery (T in ELT)

Timeline A realistic timeline for most migrations is almost certainly from 9 months up to 24 months

Typical migration steps

Identify the base data set and point-in-time data (Teradata Queries) for the initial load.
Use the BigQuery Data Transfer Service or an equivalent tool/process to extract the base data to GCS and then import to BigQuery.
Identify process/queries for delta data (while the rest of the migration process is ongoing) and Identify load of delta data to BigQuery.
Identify Data concurrency and consistency test cases.
Create the process for loading delta data from Teradata (via the extract to GCS) process to BigQuery.
Execute the data concurrent & consistency test cases.

Pythian chose to use the Teradata Hadoop Connector to migrate the data from Teradata into BigQuery.

We set up a Dataproc instance with the Hadoop connector installed and wrote scripts to pull the data from the Teradata instance.
The data will be pulled over and stored in GCS Buckets, which will then be imported to BigQuery

Why Not Teradata Data Transfer Service

Customer did not allow us to write custom queries to pull the data, but Teradata Data Transfer Service required a datetime column to be defined.
Several tables within the Teradata instance did not have a datetime column we could use, and so the entire table would have been pulled every time.

When to use MS SQL server

Data source was archived data from Teradata that was not being changed in any way.