High Availability | Fault Tolerance | Durability - FullstackCodingGuy/Developer-Fundamentals GitHub Wiki

Typical Example

HA and FT are better works together to ensure the zero service disruptions and zero downtime
HA is achieved by enabling LB to distribute the workload to multiple servers, preferably in multiple availability zones (data centers)
Auto scaling is enabled to scale the servers accross AZs based on the demand
FT is achieved by providing additional backup support for the infrastructure by the provider
If a server is down or becomes faulty/unhealthy, then ASG automatically creates a new server to handle the load
If the availability zone itself fails then existing AZ will handle the load by adding more servers as the metrics (CPU utilization %) will shoot up due to the additional traffic.

The Difference

HA

Minimal Service Interruption
Designed to ensure No Single Point of Failure (Redundancy)
Uptime measure in %, ex: 99.99% - i.e how many 9s the service is guaranteed to support
Sync or async replication to perform the operations
Lower cost compared to FT
Ways to create HA
- Elastic Load Balancing - for distributing incoming traffic to multiple nodes
- EC2 Auto Scaling

FT

No service interruption
Hence you would require specialized hardware with instantaneous failover
so, No downtime guaranteed
Synchronous replication is a Must - to enable replication in real time to ensure the zero data loss
Higher cost compared to HA as it involves replication of hardware systems too
Ways to create FT
- Fault tolerant Network Interface Cards (NIC) - introduce additional network interface card for backup
- Disk Mirroring (RAID1) - add additional hard drives to back up the data (if a data is saved in 1 drive, it goes to drive 2 as well immediately)
- Synchronous DB replication
- Redundant power backup for the data center

How load balancing works?

What is the difference between ASG and ALB?

In the AWS environment, Application Load Balancer (ALB) allows you to efficiently route traffic to the right servers. In other words, ALB does the same job as the baton of your conductor who manages the orchestra. Auto Scaling Group (ASG) enables additional servers to be activated when application traffic increases

Load Balancer

Amazon Web Services (AWS) provides us with a broad set of tools to build resilient and scalable infrastructures. Application Load Balancer (ALB) and Automatic Scaling Groups (ASG), which are the basic components of modern web architectures, are two important tools that AWS offers us to create scalable and durable infrastructures.

Application Load Balancer (ALB) allows you to efficiently route traffic to the right servers.

ALB is the gateway controller of your web traffic. It also enables incoming user requests to be distributed to multiple targets, such as EC2 instances, containers, and IP addresses in multiple Availability Zones. ALB not only distributes the load effectively, but also adds redundancy critical for high availability. ALB quickly redirects web traffic to healthy instances when a server fails. This routing also provides a smooth user experience.

ASG

Auto Scaling Group (ASG) enables additional servers to be activated when application traffic increases.

ASG dynamically adjusts the number of EC2 instances in response to traffic demands. It is extremely important to structure the ASG well. With a well-structured ASG, you don’t just scale; At the same time, you maintain the optimum balance between performance and cost. When demand increases, ASG launches new cloud servers to meet the load. ASG scales back when demand drops, ensuring you only pay for the resource you need. This allows you to benefit from AWS resources in a cost-effective manner.

Using Monitoring and Metrics for auto scaling, The Role of CloudWatch

If you want to keep a system under control/observation and monitor a system you created in AWS, the service you should use is the AWS Cloudwatch service.

AWS CloudWatch plays a critical role in keeping your entire system under control. This service monitors your applications based on metrics such as network input/output or CPU usage. While AWS CloudWatch service monitors your application, if the monitored values exceed the thresholds you specify, CloudWatch triggers alarms.

These alarms, which you will set up with the AWS CloudWatch service, can be configured to notify you about exceeding the threshold in any metric or in case of any anomaly detection, and even trigger the ASG to scale up, as in the example you will experience in the hands-on section.

The Practicality of Scalability

It is important to understand how ALB and ASG work effectively together.

If you use these two services together effectively, you will provide a much better experience to the users and establish a system that is effective in terms of cost and time.

The practical implementation of ALB and ASG in AWS is extremely comprehensive. Imagine you have an e-commerce website.

For example, there is a Black Friday sale and your website is under heavy traffic. What will happen in this case? Thousands of users will flock to your website. But you don’t need to worry. ALB will ensure that no single server carries too much load. At the same time, the ASG will simultaneously receive information from CloudWatch metrics. ASG will help you adapt to possible traffic fluctuations by activating additional servers to adapt to increased traffic. This not only prevents possible disasters, but also ensures that you maintain speed and reliability. As a result, customer satisfaction is achieved and business success occurs.

High Availability

Types of load balancer (aka Load Balancing Techniques)

Random
Round Robin (Basic)
Weight Based Round Robin - ex: add more weight to high performing server, more weight = more requests
Ratio Based - ex: double the server size send twice as much traffic to it

Durability

Fault Tolerance

Follow Design for failure principle to make your application fault tolerant.

Avoid single point of failure
Assume everything fails, and design backwards
Goal: Applications should continue to function even if the underlying physical hardware fails or removed/replaced
Design your recovery process
Trade off business needs vs cost of high availability
Use multiple availability zones
Replicate data across multiple AZs
Use Real-time monitoring (Cloudwatch)
Use EBS(Elastic Block Store) for persistent file systems
Take EBS snapshots and use s3 for backups

AWS takes care of all the ways to add redundancy at the infra level such as back up for Network cards, disk storage, power backup.

Problems:

For issues like Intermittent Network Issues, Service Throttling and Application Timeouts, the application should quickly accept the failure and handle it appropriately.

Solution1: Retries with back off

Read here

Solution2: Circuit Breaker Pattern - for service delays and performance issues.

Read Here

Polyglot Persistence in Event-driven microservice

Each micro service will perform transaction on its own db or more than one db.

For instance, if one transaction is failed in one of the microservice due to a timeout, the entire transaction should be rolled back to comply the ACID principle. In order to achieve this, we need to consider SAGA Orchestrator Pattern, which executes series of compensation tasks to reverse/rollback the transactions that were made by preceeding transactions.

Saga Pattern for distributed transaction - (similar to newton's 3rd lay - every action has its reaction - every operation has its compensation operation).

High Availability | Fault Tolerance | Durability - FullstackCodingGuy/Developer-Fundamentals GitHub Wiki

Typical Example

The Difference

HA

FT

Load Balancer

ASG

Types of load balancer (aka Load Balancing Techniques)

Problems:

Solution1: Retries with back off

Solution2: Circuit Breaker Pattern - for service delays and performance issues.

Polyglot Persistence in Event-driven microservice

Saga Pattern for distributed transaction - (similar to newton's 3rd lay - every action has its reaction - every operation has its compensation operation).

References

⚠️ GitHub.com Fallback ⚠️

High Availability | Fault Tolerance | Durability - FullstackCodingGuy/Developer-Fundamentals GitHub Wiki

Typical Example

The Difference

HA

FT

Load Balancer

ASG

Types of load balancer (aka Load Balancing Techniques)

Problems:

Solution1: Retries with back off

Solution2: Circuit Breaker Pattern - for service delays and performance issues.

Polyglot Persistence in Event-driven microservice

Saga Pattern for distributed transaction - (similar to newton's 3rd lay - every action has its reaction - every operation has its compensation operation).

References

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️