GCP Service to Service Integration - vidyasekaran/GCP GitHub Wiki


Which GCP Service integrate with other service and how? https://cloud.google.com/architecture/data-lifecycle-cloud-platform


  Source            |      Destination    |   Integration Method

  Amazon S3         |      Cloud Storage  |    Storage Transfer Service
                                               If you are transferring large amounts of files on a 
                                               daily basis from Amazon S3 to Cloud Storage, you 
                                               can use the Storage Transfer Service to transfer 
                                               files from sources including Amazon S3 and HTTP/HTTPS 
                                               services. You can set up regularly recurring transfers 
                                               and Storage Transfer Service supports several advanced 
                                               options. This service takes advantage of the large network 
                                               bandwidth between major cloud providers and uses advanced 
                                               bandwidth optimization techniques to achieve very high 
                                               transfer speeds.

                                               https://cloud.google.com/storage-transfer-service

Below topics are covered

Scenario 1: transferring files from on-premises servers Scenario 2: transferring files from other cloud providers

https://cloud.google.com/architecture/mobile-gaming-analysis-telemetry#streaming_pipeline


  AppEngine                CloudLogging       For example, apps running on App Engine automatically 
                                              log the details of each request and response to 
                                              Cloud Logging. You can also write custom logging 
                                              messages to stdout and stderr, which Cloud Logging 
                                              automatically collects and displays in the Logs Viewer.

 ComputeEngine/GKE        CloudLogging        Cloud Logging provides a logging agent, based on **fluentd**, 
                                              that you can run on virtual machine (VM) instances hosted 
                                              on Compute Engine as well as container clusters managed 
                                              by GKE. The agent streams log data from common 
                                              third-party apps and system software to Cloud Logging.

  Cloud Storage     |      BigQuery       |   An app outputs batch CSV files 
                                              to the object store of Cloud Storage.       
                                              From there, the import function 
                                              of BigQuery, an analytics data warehouse, 
                                              can pull the data in for analysis and querying.


 Cloud Storage     |       Pub/Sub        |  Create a datastore bucket a pub/sub 
                                             notification topic also set 
                                             a pub/sub notification for that bucket
                                             so that when u store an image or file 
                                             in cloud storage it is sent to notification topic 
                                             
                                             export REGION=us-central1

                                             export GCS_NOTIFICATION_TOPIC="gcs-notification-topic"

                                             export GCS_NOTIFICATION_SUBSCRIPTION="gcs-notification-subscription"

                                             export PROJECT=$(gcloud config get-value project)

                                             export VIDEO_CLIPS_BUCKET=${PROJECT}_videos

                                             In Cloud Shell, create a Pub/Sub topic:

                                             gcloud pubsub topics create ${GCS_NOTIFICATION_TOPIC}

                                             Create a Pub/Sub subscription for the topic:

                                             gcloud pubsub subscriptions create ${GCS_NOTIFICATION_SUBSCRIPTION} -- 
                                             topic=${GCS_NOTIFICATION_TOPIC}

                                             Create a bucket to store the input video clips:

                                             gsutil mb -c standard -l ${REGION} gs://${VIDEO_CLIPS_BUCKET}

                                            **Create a Pub/Sub notification for the bucket:**

                                             gsutil notification create -t ${GCS_NOTIFICATION_TOPIC} 
                                            -f json gs://${VIDEO_CLIPS_BUCKET}

                                             Now that you have configured notifications, the system sends 
                                             a Pub/Sub message to the topic that you created every 
                                             time you upload a file to the bucket.
                                             
                                             Create a bucket to store the input video clips:

                                             gsutil mb -c standard -l ${REGION} gs://${VIDEO_CLIPS_BUCKET}

                                             Create a Pub/Sub notification for the bucket:

                                             gsutil notification create -t ${GCS_NOTIFICATION_TOPIC} 
                                             -f json gs://${VIDEO_CLIPS_BUCKET}

                                             Now that you have configured notifications, the system sends a 
                                             Pub/Sub message to the topic that you created every time you 
                                             upload a file to the bucket.

API Reference : https://cloud.google.com/storage/docs/reference/libraries#client-libraries-usage-java https://github.com/vidyasekaran/GCP/wiki/GCP-Solutions


 Cloud Storage     |       Dataflow        | You can setup Dataflow pipeline to polls every 10 
                                             seconds for new text files stored in Cloud Storage 
                                             and outputs each line to a Pub/Sub topic. 
                                             (Using DataFlow - Create job from Template)

 Cloud Storage     |       PubSub/cloudsql | Write a cloud function by selecting bucket action 3 dots
                                             for event like file creation and code to put it on 
                                             pubsub or any other service.

  Cloud Storage     |      CloudScheduler/   | Write a cloud function by selecting bucket action 3 dots
                           CloudFunction       for event like file creation and write cloud function 
                                               code to put it on pubsub or any other service.   

  Pub/Sub           |      BigQuery/BigTable | Ingestion user interaction and server events To make use 
                           /CloudStorage       of user interaction events from end-user apps or server 
                                               events from your system, you may forward them to Pub/Sub 
                                               and then use a stream processing tool such as **Dataflow**) 
                                               which delivers them to BigQuery, Bigtable, Cloud Storage 
                                               and other databases. Pub/Sub allows you to gather events 
                                               from many clients simultaneously.

https://cloud.google.com/pubsub/docs/overview


  Pub/Sub           |      BigQuery/BigTable | Ingestion user interaction and server events To make use 
                           /CloudStorage       of user interaction events from end-user apps or server 
                                               events from your system, you may forward them to Pub/Sub 
                                               and then use a stream processing tool such as **Dataflow**) 
                                               which delivers them to BigQuery, Bigtable, Cloud Storage 
                                               and other databases. Pub/Sub allows you to gather events 
                                               from many clients simultaneously. 
                                              
                                              NOTE: You can also use **Cloud Data Fusion** to ingest data from pub/sub to BigQuery
                                              Reference : https://codelabs.developers.google.com/codelabs/real-time-csv-cdf-bq#0

  Pub/Sub           | Cloud Function  |       Created cloud function and set it to Pub/Sub topic; when i published to topic
                                              the cloud function is invoked so i can ingest to any service from here.
                                              NOTE: You can also use **Cloud Function** to ingest data from pub/sub to BigQuery
                                              https://medium.com/@milosevic81/copy-data-from-pub-sub-to-bigquery-496e003228a1

 CloudScheduler    | Pub/Sub          |      Created a job and scheduled to run every 1 minute which writes to a topic
                                             https://cloud.google.com/scheduler/docs/quickstart

 CloudScheduler    | Cloud Functions  |      Created a job and scheduled to run every 1 minute invokes cloud function
                                          
                                             https://rominirani.com/google-cloud-functions-tutorial-using-the-cloud-scheduler-to-trigger-your-functions-756160a95c43

Ingest to/from - Bigquery Loading Efficieny, tools and methods explained

https://cloud.google.com/blog/topics/developers-practitioners/bigquery-explained-data-ingestion

Building a Mobile Gaming Analytics Platform - a Reference Architecture

Covers

Real-time processing of individual events using a streaming processing pattern

Bulk processing of aggregated events using a batch processing pattern

https://cloud.google.com/architecture/mobile-gaming-analysis-telemetry#streaming_pipeline


Pub/Sub has global endpoints and leverages Google’s global front-end load balancer to support data ingestion across all Google Cloud regions, with minimal latency.

**Migrating From On-Prem to GCP: Storage (Cloud Store, Database, VMs (Velostrata), **

https://bluemedora.com/migrating-from-on-prem-to-gcp-storage/

Decision as to which datastorage service to use for storage and also about each datastore servcie ex : BigTable

https://cloud.google.com/architecture/data-lifecycle-cloud-platform