Architecture: Data Lake - davidkhala/gcp-collection GitHub Wiki

LifeCycle

1. Ingest

Batch

  • Storage Transfer Service
  • BigQuery Data Transfer Service
  • Transfer Appliance

Streaming

  • Pub/Sub

2. Store

Storage decision tree image

3. Process and Analyze

Data cleansing and normalize

  • Cloud Dataprep Data Harvest
  • Dataplex ETL
  • Dataflow and Cloud Data Fusion for data absorption

Warehouse

  • Dataproc and BigQuery

4. Explore and Visualize