GCP Solutions - vidyasekaran/GCP GitHub Wiki
Refer Key Features of all service where you can find Solution as to how to ingest and process data to / from GCP Service
Example : Cloud Storage - https://cloud.google.com/storage/ Easily transfer data to Cloud Storage
Setting up authentication while using Cloud Storage client libraries
https://cloud.google.com/storage/docs/reference/libraries#windows
Google Cloud services you can use to manage data throughout its entire lifecycle, from initial acquisition to final visualization Good for Solutioning
https://cloud.google.com/architecture/data-lifecycle-cloud-platform
https://cloud.google.com/architecture/building-a-streaming-video-analytics-pipeline?hl=en
I. Creating a Pub/Sub notification for Cloud Storage
How do you send the files uploaded to DataStore to PubSub Subscribers?
You create a pub/sub topic "gcs-notification-topic" and create a subscription "gcs-notification-subscription" for that topic "gcs-notification-topic" and then you create a Pub/Sub notification for the bucket. The system sends a Pub/Sub message to the topic ("gcs-notification-topic") that you created every time you upload a file to the bucket.
export REGION=us-central1
export GCS_NOTIFICATION_TOPIC="gcs-notification-topic"
export GCS_NOTIFICATION_SUBSCRIPTION="gcs-notification-subscription"
export PROJECT=$(gcloud config get-value project)
export VIDEO_CLIPS_BUCKET=${PROJECT}_videos
In Cloud Shell, create a Pub/Sub topic:
gcloud pubsub topics create ${GCS_NOTIFICATION_TOPIC}
Create a Pub/Sub subscription for the topic:
gcloud pubsub subscriptions create ${GCS_NOTIFICATION_SUBSCRIPTION} --topic=${GCS_NOTIFICATION_TOPIC}
Create a bucket to store the input video clips:
gsutil mb -c standard -l ${REGION} gs://${VIDEO_CLIPS_BUCKET}
Create a Pub/Sub notification for the bucket:
gsutil notification create -t ${GCS_NOTIFICATION_TOPIC} -f json gs://${VIDEO_CLIPS_BUCKET}
Now that you have configured notifications, the system sends a Pub/Sub message to the topic that you created every time you upload a file to the bucket.
NOTE: you can write cloud function you can upload your code, either by pasting it inline or uploading a zip, or by linking a Cloud Source repository.
Reference : https://www.cloudsavvyit.com/4975/how-to-run-gcp-cloud-functions-periodically-with-cloud-scheduler/
**I.a ** Also When u create a topic you can trigger a cloud function where you can write code to store it in cloud storage.
**II. **How do you send the events from PubSub to other services like Cloud Storage, BigQuery and BigTable?
Refer : https://cloud.google.com/pubsub/docs/overview
Ingestion user interaction and server events To make use of user interaction events from end-user apps or server events from your system, you may forward them to Pub/Sub and then use a stream processing tool such as Dataflow) which delivers them to BigQuery, Bigtable, Cloud Storage and other databases. Pub/Sub allows you to gather events from many clients simultaneously.
III. You can also setup Dataflow pipeline to polls every 10 seconds for new text files stored in Cloud Storage and outputs each line to a Pub/Sub topic. (Using DataFlow - Create job from Template)
I am able to create Dataflow - job from template to read input files from cloudstorage and put it in Pub/Sub Topic where a subscription is set on the topic so i was able to view the message by clicking Pull.
NOTE: using Dataflow Create job from Template we can read data from N number of source to destination.
You can Schedule Google Cloud Dataflow pipeline with help of cloud scheduler https://cloud.google.com/community/tutorials/schedule-dataflow-jobs-with-cloud-scheduler
**IV. ** You can create a Bucket and Write a Cloud Function by selecting action 3 dots and "select process with Cloud function" - write code on specific event like file creation/deletion to put it on pub sub topic or process it and put in another bucket. (Refer Renga course - 192 session)
How to Run GCP Cloud Functions Periodically with Cloud Scheduler
https://www.cloudsavvyit.com/4975/how-to-run-gcp-cloud-functions-periodically-with-cloud-scheduler/
We create a job in cloud Scheduler where 3 types possible (HTTP,Pub/Sub,AppEngine)
Schedule a job in Cloud Scheduler which puts a message body in a pub/sub topic
I created a Cloud Scheduler job for every minute "* * * * *" , message body as "helloworld" and set Type as Pub/Sub created topic and subscription to it. Ran the Cloud Scheduler job once and was able view messages and pull from Pub/Sub Subscription.
You can write a batch job using Apache Beam and run it using Cloud Dataflow and Schedule it to run at specific interval using cloud schedueler
How to Deploy Your Apache Beam Pipeline in Google Cloud Dataflow
Reference to create a cloud dataflow template : https://cloud.google.com/dataflow/docs/guides/templates/creating-templates
You can Schedule Google Cloud Dataflow pipeline https://cloud.google.com/community/tutorials/schedule-dataflow-jobs-with-cloud-scheduler