Gcp Highavailability - vidyasekaran/GCP GitHub Wiki

High Availability and Data Protection on Google Cloud

https://www.youtube.com/watch?v=VG5LV-ad2I0

          Implement biz continuity aligned to the needs of your business

           Backup                        Cross Zone HA                        Cross Region HA 
          
          Protect data from              Ensure continuous data               Ensure data replication and 
          accidental deletion,           and app availability                 app recoverability with minimal 
          corruption or other            in case of zone failure.             downtime in case of region failure.
          unintended loss.            
          
          
          ---------------------------------- Complexity / Cost ---------------------------------------->

Protect IAAS Workload

Compute Engine offers several types of storage options for your instances. Each of the following storage options has unique price and performance characteristics:

Compute


HA

Regional MIG
Regional GKE Cluster

Backup

VM machine images
Anthos data protection for GKE Cluster

DR

Deployment Manager

Storage


HA

Regional Persistent Disk (https://cloud.google.com/compute/docs/disks#repds)

Regional persistent disks provide durable storage and replication of data between two zones in the same region. If you are designing robust systems or high availability services on Compute Engine, use regional persistent disks combined with other best practices such as backing up your data using snapshots. Regional persistent disks are also designed to work with regional managed instance groups.

Regional GCS Bucket

https://cloud.google.com/storage/

Standard Storage: Good for “hot” data that’s accessed frequently, including websites, streaming videos, and mobile apps.

Nearline Storage: Low cost. Good for data that can be stored for at least 30 days, including data backup and long-tail multimedia content.

Coldline Storage: Very low cost. Good for data that can be stored for at least 90 days, including disaster recovery.

Archive Storage: Lowest cost. Good for data that can be stored for at least 365 days, including regulatory archives.

Backup

Persistent Disk Snapshot Persistent Disk Clone FileStore backup GCS Object Versioning

DR

Geo Redundant Cloud Storage

App


Backup

Persistent disk app -

Consistent Snapshot - hooks

SAP Certified cloud storage

Backing Agent for Sap hana


Protect PAAS Workload

HA/DR

CloudSQL- CloudSQL HA
https://cloud.google.com/sql/docs/mysql/high-availability#normal

https://cloud.google.com/sql/docs/mysql/configure-ha

https://cloud.google.com/sql/docs/mysql/replication#cross-region-read-replicas

CloudSpanner- MultiZone/Region Cloud Spanner

https://cloud.google.com/blog/topics/developers-practitioners/demystifying-cloud-spanner-multi-region-configurations

CloudBigTable- Cloud bigtable multizone/region replication

Google BigQuery - Bigquery regional instances

Backup

CloudSQL - CloudSQL(Mysql,postgres,sqlserver automated backups),CloudSQL Import/Export,Pointintime recovery

CloudSpanner - Cloud Spanner Backups, Cloud Spanner Exports

CloudBigTable - Cloud Bigtable managed backups, cloud bigtable exports

Google BigQuery- BiqQuery cross region dataset copy


https://youtu.be/ghdleRvGExg

Http://showcase.with google.com/cloud-sql ( test do r&d )

https://cloud.withgoogle.com/next (data modernization)

https://cloud.withgoogle.com/next (data modernization)

Cloud sql

  1. Cloud sql supports MySQL,Postgres and sqlserver.

Support for data protection (backups) and disaster recovery (RPO settings during dB config) Offers Automated /Manual backups

  1. Automated backups are taken everyday as per the time set by end user and We can also order google to take a backup.

  2. Offers point in time recovery by internally using transactions log.

  3. Offers synchronous and asynchronous replication.

RPO recovery point objective - time duration of data loss we are ok with say 5mins of data loss we are fine. High critical systems need lesser RPO.

**Cloud sql offers us to 2 types of mechanisms for disaster recovery **

a. Time stamp based recovery

Allows to set RPO IN MILLISECONDS upto 7 days to recover database hit by a disaster. Meaning from backup we can expect to get recovered upto few milliseconds before a disaster took place.

b. Transactions based recovery

Allows us to set number of transactions based recovery.

RTO recovery time objective - how much does it take to bring up database from backup.

High availability

High availability - usually involves replication, health check and workflow management to auto recover dB in an event of an incident.

Ha involves setting redundancy also called cluster setup and required by gcp to comply with their SLA.

When a high availability setup is enabled We setup region wise redundant standby db created in another zone b and primary at zone a for synchronous replication it automatically failover to another zone it uses static ip and just if application retries to reconnect it connects to failover dB so for app it may look like dB is down for few millisecond. As part of workflow the IP address and dB name of primary database is set to standby dB and all dbs operation the data is written to both primary and standby instances.

  1. Offers cross region replication - guard against regional failure, secure private network, global vpc, create rep,Inca in another region without much configuration.

  2. Offers asynchronous replication in multiple zones where primary returns txns before replica writes are complete, this is to help develop performant system, so replicas may be few milliseconds lag. This replicas can be used to generate reports or as reporting system.

Recommendation -

a. daily backups and millisecond point in time recovery. b. Consider taking more on demand backup during schema changes. C. Enable in region high availability (synchronous) configuration. D. Use in region and cross region replication (asynchronous). Helps to have near real-time standby db for reporting and accessible with less RTO.

HA for Compute (Learn with Mahesh : GCP Professional cloud architect certification)

https://youtu.be/cAo--7CzYmI

**Compute Engine **- Workload to be deployed in 2 different zones so if 1 goes down another can serve.

Solution : Create a Managed instance group - select multi zone option so that it deploys in multiple zones. Also enable auto scaling and have 3 zones minimum.

Achieving HA in Managed Instance Group

https://youtu.be/RE-0oNgeOOw

We can setup LB for distributing load evenly and auto scaling to cater to constantly raising traffic and for High Availability Managed instance group allows Auto healing and Auto updating configurations.

**Auto healing **- observe and replace unhealthy with healthy instances. Auto Updating - Update instance software or patches without downtime.

To Configure MIG with HA we need to setup health check which help achieve Autohealing so that MIG probes instances to observe for failures such as 500 and replace non healthy instance with healthy instance.

Create a health check named - "health-check" in Compute Engine -with health criteria shown below

Name : health-check

Check Interval : 10 sec ; Time out - 5 sec (wait for 5 seconds for response to probe)

Healthy Threshold : 2 consecutive sucesses

Unhealthy threshold : 3 consecutive failure

Now go to Instance Group - edit - go to autohealing health check - set 90 seconds delay and select health check named "health-check" which you created before and setup.

Now test by going to VM instance and simulate failure.

For Auto Updating - Applying updates to instance without downtime - MIG provides speed and scope to auto update without affecting server downtime. Partial rollouts are also possible via canary testing.

**Instance Group **--> edit - select rolling update (meaning gradual update) and provide 20% so this much traffic goes to newly created instances - change update mode from proactive (too disruptive) which proactively deletes and creates instance - you can also choose opportunistic meaning - updates occur as we manually restart server or a new server is auto started by MIG

Maximum surge - Max number of temporary instances to add while updating

Kubernetes Engine - Containerized by default - location type in zonal - if we select zonal - we get 1 master and 3 nodes. so no HA if master goes down however if we select "Regional here so that we get 3 masters and can select number of nodes ==3 meaning we get 3 worker nodes in every zone. By selecting region we are restricted to one region however we can setup "a multi cluster" ingress by this we can have 2 clusters in 2 regions with this multi cluster ingress setup we can have 2 clusters in different regions so even if one region goes down we still can serve from another region.

App Engine (PAAS) - Its a regional resource - **but only selective locations by default offer multi regional support. ** example : us-central (iowa) and europe-west so it has High Availablity. Autoscaling is also enabled for these regions.

If you configure app engine to run in a zone closer to your place the you need to select autoscaling so that if one instance goes down still its served from other zone.

Cloud Functions - its serverless so google takes care of HA

App Engine Flexible - We can enable autoscaling or manual scaling where we get 2 instances.

HA and Failover design for GCP Storage

https://youtu.be/MHNKzdFPfTg

BigTable- For HA ; setup replication cluster - create 2 clusters in different regions or zone. We are setting up replication in different zones or regions to achieve High Availablity.

CloudSQL Enable HA during creation setup and automated backups - come under "Enable auto backup & HA" enable - Automate backups - Create failover replica -

Spanner- Regional resource - you can choose regional or multi regional - If you select regional you will get replication in 3 zones - whereas if you choose multi region you get multi regional replication support.

MemoryStore - Instance tier (basic (no HA) , Standard (Includes a failover replica in seperate zone for HA, cannot downgrade later. Replication is within same zone - if entire zone goes down - no HA).

Below services are Serverless so HA is managed by CGP

DataStore - Serverless NOSQL - HA managed by GCP.

FireStore - its datastores next gen - serverless and HA is managed by GCP.

FileStore - NO HA - Zonal in nature

cloud storage- Already replication is in built - its serverless.