Gcp Highavailability - vidyasekaran/GCP GitHub Wiki
High Availability and Data Protection on Google Cloud
https://www.youtube.com/watch?v=VG5LV-ad2I0
Implement biz continuity aligned to the needs of your business
Backup Cross Zone HA Cross Region HA
Protect data from Ensure continuous data Ensure data replication and
accidental deletion, and app availability app recoverability with minimal
corruption or other in case of zone failure. downtime in case of region failure.
unintended loss.
---------------------------------- Complexity / Cost ---------------------------------------->
Protect IAAS Workload
Compute Engine offers several types of storage options for your instances. Each of the following storage options has unique price and performance characteristics:
Compute
HA
Regional MIG
Regional GKE Cluster
Backup
VM machine images
Anthos data protection for GKE Cluster
DR
Deployment Manager
Storage
HA
Regional Persistent Disk (https://cloud.google.com/compute/docs/disks#repds)
Regional persistent disks provide durable storage and replication of data between two zones in the same region. If you are designing robust systems or high availability services on Compute Engine, use regional persistent disks combined with other best practices such as backing up your data using snapshots. Regional persistent disks are also designed to work with regional managed instance groups.
Regional GCS Bucket
https://cloud.google.com/storage/
Standard Storage: Good for “hot” data that’s accessed frequently, including websites, streaming videos, and mobile apps.
Nearline Storage: Low cost. Good for data that can be stored for at least 30 days, including data backup and long-tail multimedia content.
Coldline Storage: Very low cost. Good for data that can be stored for at least 90 days, including disaster recovery.
Archive Storage: Lowest cost. Good for data that can be stored for at least 365 days, including regulatory archives.
Backup
Persistent Disk Snapshot Persistent Disk Clone FileStore backup GCS Object Versioning
DR
Geo Redundant Cloud Storage
App
Backup
Persistent disk app -
Consistent Snapshot - hooks
SAP Certified cloud storage
Backing Agent for Sap hana
Protect PAAS Workload
HA/DR
CloudSQL- CloudSQL HA
https://cloud.google.com/sql/docs/mysql/high-availability#normal
https://cloud.google.com/sql/docs/mysql/configure-ha
https://cloud.google.com/sql/docs/mysql/replication#cross-region-read-replicas
CloudSpanner- MultiZone/Region Cloud Spanner
CloudBigTable- Cloud bigtable multizone/region replication
Google BigQuery - Bigquery regional instances
Backup
CloudSQL - CloudSQL(Mysql,postgres,sqlserver automated backups),CloudSQL Import/Export,Pointintime recovery
CloudSpanner - Cloud Spanner Backups, Cloud Spanner Exports
CloudBigTable - Cloud Bigtable managed backups, cloud bigtable exports
Google BigQuery- BiqQuery cross region dataset copy
Http://showcase.with google.com/cloud-sql ( test do r&d )
https://cloud.withgoogle.com/next (data modernization)
https://cloud.withgoogle.com/next (data modernization)
Cloud sql
- Cloud sql supports MySQL,Postgres and sqlserver.
Support for data protection (backups) and disaster recovery (RPO settings during dB config) Offers Automated /Manual backups
-
Automated backups are taken everyday as per the time set by end user and We can also order google to take a backup.
-
Offers point in time recovery by internally using transactions log.
-
Offers synchronous and asynchronous replication.
RPO recovery point objective - time duration of data loss we are ok with say 5mins of data loss we are fine. High critical systems need lesser RPO.
**Cloud sql offers us to 2 types of mechanisms for disaster recovery **
a. Time stamp based recovery
Allows to set RPO IN MILLISECONDS upto 7 days to recover database hit by a disaster. Meaning from backup we can expect to get recovered upto few milliseconds before a disaster took place.
b. Transactions based recovery
Allows us to set number of transactions based recovery.
RTO recovery time objective - how much does it take to bring up database from backup.
High availability
High availability - usually involves replication, health check and workflow management to auto recover dB in an event of an incident.
Ha involves setting redundancy also called cluster setup and required by gcp to comply with their SLA.
When a high availability setup is enabled We setup region wise redundant standby db created in another zone b and primary at zone a for synchronous replication it automatically failover to another zone it uses static ip and just if application retries to reconnect it connects to failover dB so for app it may look like dB is down for few millisecond. As part of workflow the IP address and dB name of primary database is set to standby dB and all dbs operation the data is written to both primary and standby instances.
-
Offers cross region replication - guard against regional failure, secure private network, global vpc, create rep,Inca in another region without much configuration.
-
Offers asynchronous replication in multiple zones where primary returns txns before replica writes are complete, this is to help develop performant system, so replicas may be few milliseconds lag. This replicas can be used to generate reports or as reporting system.
Recommendation -
a. daily backups and millisecond point in time recovery. b. Consider taking more on demand backup during schema changes. C. Enable in region high availability (synchronous) configuration. D. Use in region and cross region replication (asynchronous). Helps to have near real-time standby db for reporting and accessible with less RTO.
HA for Compute (Learn with Mahesh : GCP Professional cloud architect certification)
**Compute Engine **- Workload to be deployed in 2 different zones so if 1 goes down another can serve.
Solution : Create a Managed instance group - select multi zone option so that it deploys in multiple zones. Also enable auto scaling and have 3 zones minimum.
Achieving HA in Managed Instance Group
We can setup LB for distributing load evenly and auto scaling to cater to constantly raising traffic and for High Availability Managed instance group allows Auto healing and Auto updating configurations.
**Auto healing **- observe and replace unhealthy with healthy instances. Auto Updating - Update instance software or patches without downtime.
To Configure MIG with HA we need to setup health check which help achieve Autohealing so that MIG probes instances to observe for failures such as 500 and replace non healthy instance with healthy instance.
Create a health check named - "health-check" in Compute Engine -with health criteria shown below
Name : health-check
Check Interval : 10 sec ; Time out - 5 sec (wait for 5 seconds for response to probe)
Healthy Threshold : 2 consecutive sucesses
Unhealthy threshold : 3 consecutive failure
Now go to Instance Group - edit - go to autohealing health check - set 90 seconds delay and select health check named "health-check" which you created before and setup.
Now test by going to VM instance and simulate failure.
For Auto Updating - Applying updates to instance without downtime - MIG provides speed and scope to auto update without affecting server downtime. Partial rollouts are also possible via canary testing.
**Instance Group **--> edit - select rolling update (meaning gradual update) and provide 20% so this much traffic goes to newly created instances - change update mode from proactive (too disruptive) which proactively deletes and creates instance - you can also choose opportunistic meaning - updates occur as we manually restart server or a new server is auto started by MIG
Maximum surge - Max number of temporary instances to add while updating
Kubernetes Engine - Containerized by default - location type in zonal - if we select zonal - we get 1 master and 3 nodes. so no HA if master goes down however if we select "Regional here so that we get 3 masters and can select number of nodes ==3 meaning we get 3 worker nodes in every zone. By selecting region we are restricted to one region however we can setup "a multi cluster" ingress by this we can have 2 clusters in 2 regions with this multi cluster ingress setup we can have 2 clusters in different regions so even if one region goes down we still can serve from another region.
App Engine (PAAS) - Its a regional resource - **but only selective locations by default offer multi regional support. ** example : us-central (iowa) and europe-west so it has High Availablity. Autoscaling is also enabled for these regions.
If you configure app engine to run in a zone closer to your place the you need to select autoscaling so that if one instance goes down still its served from other zone.
Cloud Functions - its serverless so google takes care of HA
App Engine Flexible - We can enable autoscaling or manual scaling where we get 2 instances.
HA and Failover design for GCP Storage
BigTable- For HA ; setup replication cluster - create 2 clusters in different regions or zone. We are setting up replication in different zones or regions to achieve High Availablity.
CloudSQL Enable HA during creation setup and automated backups - come under "Enable auto backup & HA" enable - Automate backups - Create failover replica -
Spanner- Regional resource - you can choose regional or multi regional - If you select regional you will get replication in 3 zones - whereas if you choose multi region you get multi regional replication support.
MemoryStore - Instance tier (basic (no HA) , Standard (Includes a failover replica in seperate zone for HA, cannot downgrade later. Replication is within same zone - if entire zone goes down - no HA).
Below services are Serverless so HA is managed by CGP
DataStore - Serverless NOSQL - HA managed by GCP.
FireStore - its datastores next gen - serverless and HA is managed by GCP.
FileStore - NO HA - Zonal in nature
cloud storage- Already replication is in built - its serverless.