Azure Redundancy - barialim/architecture GitHub Wiki

Table of Content

Basic Architecture of Azure

azure-region-pairs-availability-zone

Geography

As you can see from the image above, at the highest level we have an Azure Geography. An Azure geography is an area of the world that contains one or more Azure Regions.

Geographies define a discrete market, typically containing one or more regions, that preserve data residency and compliance boundaries. Find more information about Azure's global infrastructure here

Region

A region is a set of Data centers deployed within a latency-defined perimeter and connected through a dedicated regional low-latency network. This ensures that Azure services within an Azure region offer the best possible performance and security.

An azure region is made up of one or more datacenters. If availability zones are enabled, an azure region contains a minimum of three availability zones. An Availability Zone is made up of one or more datacenters. So the point is, an Azure region contains one or more datacenters or 3 or more availability zones if enabled.

In simple terms, azure region is a group of 1 or more data centers.

azure-region

You have the flexibility to deploy your applications and data to any Azure region you want. You can even deploy across multiple regions to deliver cross-region resiliency.

Azure is generally available in 60+ regions around the world, and it's still growing.

Cross-region Resiliency

In general, resilience is the ability of a software to react to problems in one of its components and still provide the best possible service. Both your software and the underlying infrastructure must be resilient. If there is a problem, the end user should not know about it. The request must be handled and processed by another region. The end user should get the same level of service. We can get this resiliency, by deploying our application and data in at least 2 regions. In this example we have our application and data deployed in two regions - Region A and Region B.

azure-resiliency

If there is a region level failure, for example, let's say Region A has gone down. The Azure Traffic Manager is smart enough to send all the requests to Region B. The end user gets the same response. He does not even know there is a region level failure. When Region A is back online, the Azure Traffic Manager will distribute the traffic between both regions again depending on your TM configuration.

Region Pairs

A region pair consist of two regions within the same geography. Most regions in a geography are paired to ensure business continuity and disaster recovery (BCDR). Click to find Azure regional pairs list.

Azure serializes (arranges in a series/sequentially) platform updates so only one region is updated at a time. If an outage affects multiple regions, at least one region in each pair will be prioritized for recovery.

azure-region-pairs

Azure provides several storage solutions that take advantage of paired regions to ensure data availability. For example, Azure Geo-redundant Storage (GRS) replicates data to a secondary region automatically, ensuring that data is durable even in the event that the primary region isn't recoverable.

⚠️ Note that not all Azure services automatically replicate data, nor do all Azure services automatically fallback from a failed region to its pair. In such cases, recovery and replication must be configured by the customer.

Is it compulsory to use Azure regional pairs?

No. Customers can leverage Azure services to architect a resilient service without relying on Azure's regional pairs. However, we recommend that you configure business continuity disaster recovery (BCDR) across regional pairs to benefit from isolation and improve availability. For applications that support multiple active regions, we recommend using both regions in a region pair where possible. This ensures optimal availability for applications and minimized recovery time in the event of a disaster. Whenever possible, design your application for maximum resiliency and ease of disaster recovery.

Benefits of Paired regions

  • Physical separation between datacenters: When possible, Azure prefers at least 300 miles of separation between datacenters in a regional pair, although this isn't practical or possible in all geographies. Physical datacenter separation reduces the likelihood of natural disasters, civil unrest, power outages, or physical network outages affecting both regions at once. Isolation is subject to the constraints within the geography (geography size, power/network infrastructure availability, regulations, etc.).
  • Automatic Platform-provided replication: Some services such as Geo-Redundant Storage provide automatic replication to the paired region. This is a great benefit. In an event, where one of the regions go down, you still have the data available from the other region in the region pair.
  • Region recovery in the event of an outage: If for whatever reason, several regions world-wide are down, azure prioritizes recovery of one region out of every pair. So if you want your apps and data to be highly available, deploy them in paired regions. With this setup, if both the regions are down, azure prioritizes to recover at least one region from the pair, so we have our apps and data available again soon. If applications are deployed across regions that are not paired, recovery might be delayed, in the worst case the chosen regions may be the last two to be recovered.
  • Sequential system updates: Planned Azure system updates are rolled out to paired regions sequentially (not at the same time) to minimize downtime, the effect of bugs, and logical failures in the rare event of a bad update.
  • Data residency, compliance and legal requirements : With the exception of Brazil South, regions with in a region pair are from the same geography. This helps us meet data residency, compliance and legal requirements.

Availability Zone

Azure Availability Zone (AZ) is a unique physical location within an Azure region.

Each Availability Zone is made up of one or more datacenters with independent power, cooling, and networking. AZ is also known as Data center within Azure Region. So the point is, an Azure region contains one or more datacenters or 3 or more availability zones if enabled.

AZ is an Infrastructure level HA offering that protects your application/data services from Data center failure. ⭐

⚠️Note: Not all Regions have Availability Zones. Make sure you check whether

Regions that support AZ have a minimum of three separate zones to ensure resiliency. ⭐

availability-zone

Zone-redundant services replicate your applications and data across Azure Zones to protect from single-points-of-failure.

With AZs, Azure offers industry best 99.99% VM uptime SLA(Service Level Agreement)

Availability Set

https://k21academy.com/microsoft-azure/az-104/az-104-region-availability-zone-availability-sets-and-fault-domainupdate-domain-in-microsoft-azure/

Availability options for Azure Virtual Machines

https://docs.microsoft.com/en-us/azure/virtual-machines/availability

High Availability

High Availability (HA) describes systems that are dependable enough to operate continuously without failing.

Why is HA important

To reduce interruptions and downtime, it is essential to be ready for unexpected events that can bring down servers.

At times, emergencies will bring down even the most robust, reliable software and systems. HA systems minimize the impact of these events, and can often recover automatically from component or even server failures.

How to achieve HA

A HA system should be able to quickly recover from any form of failure state to minimize interruption/outage for the end user.

Best Practices for achieving HA

  • Eliminate single point of failure ⭐
  • Ensure that all systems and data are backed up for simple recovery ⭐
  • Use load balancing to distribute application and network traffic across servers or other hardware.
  • Continuously monitor the health of backend server
  • Distribute resources geographically in case of power outages or natural disasters. ⭐
  • Implement reliable crossover or failover In terms of storage, a redundant array of independent disks (RAID) or storage area network (SAN) are common approaches.
  • Set up a system that detects failures as soon as they occur.
  • Design system parts for high availability and test their functionality before implementation.

Difference between HA and Redundancy

Redundancy is a hardware based approach. On the other hand, implementing HA strategies nearly always involves software.⭐

Redundancy alone cannot ensure HA. ⭐ A system also needs failure detectability mechanisms.

HA vs Fault Tolerance

HA and Fault tolerance both refer to techniques for delivering high levels of uptime (machine available & operational).⭐

However, fault tolerant vs high availability strategies achieve that goal differently. ⭐

Fault Tolerance

more...

Terminology