Basics

Cloud Concepts

6 Pillars - AWS Well-Architected and the Six Pillars

Operational Excellence

Reference

Design Principle

The following are design principles for operational excellence in the cloud:

Perform operations as code: In the cloud, you can apply the same engineering discipline that you use for application code to your entire environment. You can define your entire workload (applications, infrastructure, etc.) as code and update it with code. You can script your operations procedures and automate their process by launching them in response to events. By performing operations as code, you limit human error and create consistent responses to events.
Make frequent, small, reversible changes: Design workloads that are scalable and loosely coupled to permit components to be updated regularly. Automated deployment techniques together with smaller, incremental changes reduces the blast radius and allows for faster reversal when failures occur. This increases confidence to deliver beneficial changes to your workload while maintaining quality and adapting quickly to changes in market conditions.
Refine operations procedures frequently: As you evolve your workloads, evolve your operations appropriately. As you use operations procedures, look for opportunities to improve them. Hold regular reviews and validate that all procedures are effective and that teams are familiar with them. Where gaps are identified, update procedures accordingly. Communicate procedural updates to all stakeholders and teams. Gamify your operations to share best practices and educate teams.
Anticipate failure: Perform “pre-mortem” exercises to identify potential sources of failure so that they can be removed or mitigated. Test your failure scenarios and validate your understanding of their impact. Test your response procedures to ensure they are effective and that teams are familiar with their process. Set up regular game days to test workload and team responses to simulated events.
Learn from all operational failures: Drive improvement through lessons learned from all operational events and failures. Share what is learned across teams and through the entire organization.
Use managed services: Reduce operational burden by using AWS managed services where possible. Build operational procedures around interactions with those services.
Implement observability for actionable insights: Gain a comprehensive understanding of workload behavior, performance, reliability, cost, and health. Establish key performance indicators (KPIs) and leverage observability telemetry to make informed decisions and take prompt action when business outcomes are at risk. Proactively improve performance, reliability, and cost based on actionable observability data.

Best Practices

Organization
Prepare
Operate
Evolve

Security

Design Principle

Implement a strong identity foundation: Implement the principle of least privilege and enforce separation of duties with appropriate authorization for each interaction with your AWS resources. Centralize identity management, and aim to eliminate reliance on long-term static credentials.
Maintain traceability: Monitor, alert, and audit actions and changes to your environment in real time. Integrate log and metric collection with systems to automatically investigate and take action.
Apply security at all layers: Apply a defense in depth approach with multiple security controls. Apply to all layers (for example, edge of network, VPC, load balancing, every instance and compute service, operating system, application, and code).
Automate security best practices: Automated software-based security mechanisms improve your ability to securely scale more rapidly and cost-effectively. Create secure architectures, including the implementation of controls that are defined and managed as code in version-controlled templates.
Protect data in transit and at rest: Classify your data into sensitivity levels and use mechanisms, such as encryption, tokenization, and access control where appropriate.
Keep people away from data: Use mechanisms and tools to reduce or eliminate the need for direct access or manual processing of data. This reduces the risk of mishandling or modification and human error when handling sensitive data.
Prepare for security events: Prepare for an incident by having incident management and investigation policy and processes that align to your organizational requirements. Run incident response simulations and use tools with automation to increase your speed for detection, investigation, and recovery.

Best Practices

Reliability

Design Principle

There are five design principles for reliability in the cloud:

Automatically recover from failure: By monitoring a workload for key performance indicators (KPIs), you can start automation when a threshold is breached. These KPIs should be a measure of business value, not of the technical aspects of the operation of the service. This provides for automatic notification and tracking of failures, and for automated recovery processes that work around or repair the failure. With more sophisticated automation, it’s possible to anticipate and remediate failures before they occur.
Test recovery procedures: In an on-premises environment, testing is often conducted to prove that the workload works in a particular scenario. Testing is not typically used to validate recovery strategies. In the cloud, you can test how your workload fails, and you can validate your recovery procedures. You can use automation to simulate different failures or to recreate scenarios that led to failures before. This approach exposes failure pathways that you can test and fix before a real failure scenario occurs, thus reducing risk.
Scale horizontally to increase aggregate workload availability: Replace one large resource with multiple small resources to reduce the impact of a single failure on the overall workload. Distribute requests across multiple, smaller resources to verify that they don’t share a common point of failure.
Stop guessing capacity: A common cause of failure in on-premises workloads is resource saturation, when the demands placed on a workload exceed the capacity of that workload (this is often the objective of denial of service attacks). In the cloud, you can monitor demand and workload utilization, and automate the addition or removal of resources to maintain the more efficient level to satisfy demand without over- or under-provisioning. There are still limits, but some quotas can be controlled and others can be managed (see Manage Service Quotas and Constraints).
Manage change in automation: Changes to your infrastructure should be made using automation. The changes that must be managed include changes to the automation, which then can be tracked and reviewed.

Best Practices

Cost optimization

Design Principle

Implement Cloud Financial Management: To achieve financial success and accelerate business value realization in the cloud, invest in Cloud Financial Management and Cost Optimization. Your organization should dedicate time and resources to build capability in this new domain of technology and usage management. Similar to your Security or Operational Excellence capability, you need to build capability through knowledge building, programs, resources, and processes to become a cost-efficient organization.
Adopt a consumption model: Pay only for the computing resources that you require and increase or decrease usage depending on business requirements, not by using elaborate forecasting. For example, development and test environments are typically only used for eight hours a day during the work week. You can stop these resources when they are not in use for a potential cost savings of 75% (40 hours versus 168 hours).
Measure overall efficiency: Measure the business output of the workload and the costs associated with delivering it. Use this measure to know the gains you make from increasing output and reducing costs.
Stop spending money on undifferentiated heavy lifting: AWS does the heavy lifting of data center operations like racking, stacking, and powering servers. It also removes the operational burden of managing operating systems and applications with managed services. This permits you to focus on your customers and business projects rather than on IT infrastructure.
Analyze and attribute expenditure: The cloud makes it simple to accurately identify the usage and cost of systems, which then permits transparent attribution of IT costs to individual workload owners. This helps measure return on investment (ROI) and gives workload owners an opportunity to optimize their resources and reduce costs.

Best Practices

Performance efficiency

Design Principle

Democratize advanced technologies: Make advanced technology implementation smoother for your team by delegating complex tasks to your cloud vendor. Rather than asking your IT team to learn about hosting and running a new technology, consider consuming the technology as a service. For example, NoSQL databases, media transcoding, and machine learning are all technologies that require specialized expertise. In the cloud, these technologies become services that your team can consume, permitting your team to focus on product development rather than resource provisioning and management.
Go global in minutes: Deploying your workload in multiple AWS Regions around the world permits you to provide lower latency and a better experience for your customers at minimal cost.
Use serverless architectures: Serverless architectures remove the need for you to run and maintain physical servers for traditional compute activities. For example, serverless storage services can act as static websites (removing the need for web servers) and event services can host code. This removes the operational burden of managing physical servers, and can lower transactional costs because managed services operate at cloud scale.
Experiment more often: With virtual and automatable resources, you can quickly carry out comparative testing using different types of instances, storage, or configurations.
Consider mechanical sympathy: Understand how cloud services are consumed and always use the technology approach that aligns with your workload goals. For example, consider data access patterns when you select database or storage approaches.

Best Practices

Sustainability

Design Principle

Understand your impact: Measure the impact of your cloud workload and model the future impact of your workload. Include all sources of impact, including impacts resulting from customer use of your products, and impacts resulting from their eventual decommissioning and retirement. Compare the productive output with the total impact of your cloud workloads by reviewing the resources and emissions required per unit of work. Use this data to establish key performance indicators (KPIs), evaluate ways to improve productivity while reducing impact, and estimate the impact of proposed changes over time.
Establish sustainability goals: For each cloud workload, establish long-term sustainability goals such as reducing the compute and storage resources required per transaction. Model the return on investment of sustainability improvements for existing workloads, and give owners the resources they must invest in sustainability goals. Plan for growth, and architect your workloads so that growth results in reduced impact intensity measured against an appropriate unit, such as per user or per transaction. Goals help you support the wider sustainability goals of your business or organization, identify regressions, and prioritize areas of potential improvement.
Maximize utilization: Right-size workloads and implement efficient design to verify high utilization and maximize the energy efficiency of the underlying hardware. Two hosts running at 30% utilization are less efficient than one host running at 60% due to baseline power consumption per host. At the same time, reduce or minimize idle resources, processing, and storage to reduce the total energy required to power your workload.
Anticipate and adopt new, more efficient hardware and software offerings: Support the upstream improvements your partners and suppliers make to help you reduce the impact of your cloud workloads. Continually monitor and evaluate new, more efficient hardware and software offerings. Design for flexibility to permit the rapid adoption of new efficient technologies.
Use managed services: Sharing services across a broad customer base helps maximize resource utilization, which reduces the amount of infrastructure needed to support cloud workloads. For example, customers can share the impact of common data center components like power and networking by migrating workloads to the AWS Cloud and adopting managed services, such as AWS Fargate for serverless containers, where AWS operates at scale and is responsible for their efficient operation. Use managed services that can help minimize your impact, such as automatically moving infrequently accessed data to cold storage with Amazon S3 Lifecycle configurations or Amazon EC2 Auto Scaling to adjust capacity to meet demand.
Reduce the downstream impact of your cloud workloads: Reduce the amount of energy or resources required to use your services. Reduce the need for customers to upgrade their devices to use your services. Test using device farms to understand expected impact and test with customers to understand the actual impact from using your services.

Best Practices

Additional References, Whitepapers

IaaS - Infrastructure as a service

PaaS - Platform as a service

SaaS - Software as a service

Benefits

Cloud provides greater flexibility in managing resources and cost
Minimum upfront investments as customer does not have to purchase any physical infrastructure
Provides Just in time infrastructure
No long term contracts or commitments
Rich Automation - Infra becomes scritable using APIs and shell
Automatic scaling based on the load, Scaled out - adding more resources of same size, Scale in - removing the resources, scale up - increasing the size of the resources, scale down - decreasing the size of the resource
Increased ability of software development lifecycle
Benefits of HA (High availability) and disaster recovery

Elasticity (i.e Flexibility)

Cloud provides scalable architecture - cloud provides infra that has ability expand and contract depends on the load
Cloud infra can easily horizontally or vertically
Provides infinite scalability
Horizontal scaling - Scale up (increasing no.of web servers or nodes), Scale down (decreasing no.of web servers or nodes)
Vertical scaling - Scale out (increasing the processing capacity/memory/resources of server), Scale in (decreasing the processing capacity/memory/resources of server)

Scaling in manual vs automatic approach

Constraints

Cloud has many building blocks to construct a system
Cloud may not have exact services or components or software in place as similar to the non-cloud infra, application architecture has to support the cloud native solutions in order to maximize the cloud benefits

Cloud Practices

Design for failure

Thinking about failure while designing the product, later the product becomes fail proof
Avoid Single point of failure (Ex: hosting web app and db on same instance)
To mitigate the single point of failure, use Load balancer environment

In this scenario also, having multiple web servers connecting with single database server causing the single point of failure..

to mitigate this, use amazon RDS database instances along with elastic load balancer where scaling is automatically included along with redundancy to avoid SPF

Redundancy

Leverage redundancy in terms of software/web servers/db nodes/network resources to avoid the single point of failure.

Implement elasticity

Ability of cloud to scale resources to match the demand

2 ways of scaling

Scaling at fixed time interval

Scaling on demand based on certain metrics, when metrics reached certain threshold; to supply the resources to fulfil the demand

Decouple your components

A design principle to minimize the dependencies between components in order to improve the scalability of applications

Decoupling or loose coupling refers to a design principle concerned with minimizing dependencies between components in order to improve the scalability of applications.

Loose coupling enables the applications to scale independently

Optimize for performance

Its about decreasing the latency and increasing throughput. Also talks about how important it is to utilize the cloud resources efficiently.

Get to know about all the services, select appropriate services depends on the use cases maximize the efficiency and performance.

Keep things secure

Optimize for cost

Keeping Things Secure

Shared security Model

Security responsibility shared between customer and amazon to work together Amazon is responsible for the security OF the cloud infrastructure, physical infra, network infra Customer is responsible for the security IN the cloud such as account, user management

For IaaS

For PaaS

For SaaS

IAM: Master account

Creating AWS Account

IAM: Groups, Roles and Permissions

New User, Key pair, Security Groups

VPC (Virtual Private Cloud) aka Private Network

EC2 Classic (old)

Latest EC2

Region, Availability Zone

Elastic Block Storage

File system for data storage

Snapshots are stored into s3 incrementally, and snapshots are used to restore the data in new regions/availability zones

RDS - Relational Database Service

Elasticity - Implementation and automate infrastructure

Autoscaling

Issues in manual scaling:

automatic scaling:

Autoscaling depends on 3 main components

1. Launch configuration (what to launch) - specifies about AMI, what ec2 configuration, security group, storage etc

2. Auto Scaling Group (where to launch) - define limits, how many instances/capacity to launch

3. Scaling Policy (when to launch) - defines threshold to launch the instances, to define monitoring threshold

Cloudwatch

It allows the user to monitor the resource utilization, performance, network traffic, load, set alarm notifications

Beanstalk

It refers to automatically provisioning resources It takes care of the capacity planning, load balancing, autoscaling, application health monitoring

OpsWorks

Its a provisioning engine to automate the infrastructure needs, the difference from the beanstalk being that, user can perform more granular level configuration in opsworks.

CloudFormation

Scripted way of automating the deployment, using template file in json format specifications about the components/resources needed.

Usecase such as replicating dev env to qa or staging etc

CloudFormer

It is used to help configure and launch the required resources from the existing stack.

CodeDeploy

It is a Component service, it coordinates deployments to ec2 instances

Optimize for Performance

ElastiCache

Typical setup, but not scalable

cache with each app server instance also not idea solution

Elasticache Supports two types of cachine.

MemCached

Redis

Caching Strategies or Patterns

Write Through Pattern

Pros

It increases the data hit as all the data being kept in the cache, Data being updated in cache irrespective of the demand

Cons

Increases more storage as all the data kept in memory

Lazy Load

Props

Keeping only needed data in the memory, so less memory requirement

Cons

Higher data miss rate, hence causing lower performance

CloudFront - CDN

CloudFront stores the resources cached locally as close to the users, when the requests comes it is routed to the least latent network edge to get the resources from the regional locations.

Serverless

Storage Options

Object Storage (S3)

Objects available in s3 are highly available & durable, follows "Eventual Consistency" Model Whenever there is change in the object, there is a latency in propagating the changes to all the replicas. This causing the storage to return the objects even after delete request made.

So it is best suitable for objects that does not change much such as Archieves, videos, images

Max object size is 5TB, unlimited on no.of objects being stored. Objects can be accessed via rest api.

Glacier

Extension of S3 - for data that are retrieved infrequently

Data is transitioned from s3 to glacier when ready for archived

File Storage

RRS - Reduced Redundancy Storage

Example: store the videos and high quality images in s3 and store the thumbnail in the RRS

Optimizing for performance serverless architectures

S3

hosting static websites (https://www.linkedin.com/learning/aws-essential-training-for-architects/use-s3-for-web-application-hosting?autoSkip=true&resume=false)
static file storage

API Gateway - a managed service, its a layer between the app and users

Versioning
Caching
Throttling
Scaling
Security
Authentication & authorization
Monitoring

https://www.linkedin.com/learning/aws-essential-training-for-architects/serverless-architectures-api-gateway?autoSkip=true&resume=false

AWS Lambda

Functions as the unit of scale
Abstracts the runtime

2 Main Components of Lambda

The function
- some custom code/script that performs business logic
Event Sources
- a trigger to execute the function, ex: trigger the function when an object is added into s3 bucket (when any bucket event is occurred)

When to & When not to use the Lambda functions?

study & expand

DynamoDB - NoSQL, Schema-less, scalable database service with low latency, high performance, high throughput

data stored in ssd
data automatically replicated across multiple Availability zones

SQS - Simple Message Queue Service

It is a reliable, durable, highly scalable distributed system for passing messages between components.

Used to build loosely coupled systems (minimizing the dependencies).

SWF - Simple Workflow service

To configure and coordinate the tasks in the given workflow.

Example: Tightly Coupled E-Commerce System

SNS - Simple Notification Service

Push notifications rather than pull
Posting to a topic causes a message to send immediately
SNS lets us push notifications where as SQS requires applications to poll constantly (pull approach)
modes Email, Text Message

Cost Optimization

Simple monthly calculator tool - to anaylze the services, usage to provide the cost metrics
Get detailed billing reports by account/services/tags monthly, daily hourly
Cost Explorer - ui to get interactive reports
Billing Alarms - using CloudWatch and SNS to get the billing notifications whenever threshold reach
Create Budgets

Matching supply with demand

General Topics

Essential Cloud Components

Load Balancers
EC2 Instances for the application/api deployments
S3 Instances for the storage needs
Lambda for serverless computing
DynamoDB for nosql database requirements
RDS for database needs
CloudWatch for monitoring resources and alert systems
CloudFront for CDN solutions

Fault Tolerance

Fault tolerance refers to the ability of a system (computer, network, cloud cluster, etc.) to continue operating without interruption when one or more of its components fail.

Its about how the system is able to withstand the load when one or more of its components fail.

The objective of creating a fault-tolerant system is to prevent disruptions arising from a single point of failure, ensuring the high availability and business continuity of mission-critical applications or systems.

High Availability

Its about avoiding loss of service by ensuring enough number of resources available to serve the load.

High availability refers to a system’s ability to avoid loss of service by minimizing downtime. It’s expressed in terms of a system’s uptime, as a percentage of total running time. Five nines, or 99.999% uptime, is considered the “holy grail” of availability.

Frequently asked questions topics

FAQs

RDS

Autoscaling/Automating infrastructuring

Storage (S3, DynamoDB)

AWS Well-Architected helps cloud architects build secure, high-performing, resilient, and efficient infrastructure for a variety of applications and workloads.

AWS ‐ Concepts - FullstackCodingGuy/Developer-Fundamentals GitHub Wiki

Cloud Concepts

6 Pillars - AWS Well-Architected and the Six Pillars

Additional References, Whitepapers

IaaS - Infrastructure as a service

PaaS - Platform as a service

SaaS - Software as a service

Benefits

Elasticity (i.e Flexibility)

Scaling in manual vs automatic approach

Constraints

Cloud Practices

Design for failure

Redundancy

Implement elasticity

Decouple your components

Optimize for performance

Keep things secure

Optimize for cost

Keeping Things Secure

Shared security Model

IAM: Master account

Creating AWS Account

IAM: Groups, Roles and Permissions

New User, Key pair, Security Groups

VPC (Virtual Private Cloud) aka Private Network

Region, Availability Zone

Elastic Block Storage

RDS - Relational Database Service

Elasticity - Implementation and automate infrastructure

Autoscaling

1. Launch configuration (what to launch) - specifies about AMI, what ec2 configuration, security group, storage etc

2. Auto Scaling Group (where to launch) - define limits, how many instances/capacity to launch

3. Scaling Policy (when to launch) - defines threshold to launch the instances, to define monitoring threshold

Cloudwatch

Beanstalk

OpsWorks

CloudFormation

CloudFormer

CodeDeploy

Optimize for Performance

ElastiCache

Caching Strategies or Patterns

CloudFront - CDN

Serverless

Storage Options

Object Storage (S3)

Glacier

File Storage

RRS - Reduced Redundancy Storage

Optimizing for performance serverless architectures

S3

API Gateway - a managed service, its a layer between the app and users

AWS Lambda

2 Main Components of Lambda

When to & When not to use the Lambda functions?

DynamoDB - NoSQL, Schema-less, scalable database service with low latency, high performance, high throughput

SQS - Simple Message Queue Service

SWF - Simple Workflow service

SNS - Simple Notification Service

Cost Optimization

Matching supply with demand

General Topics

Essential Cloud Components

Fault Tolerance

High Availability

FAQs

RDS

Autoscaling/Automating infrastructuring

Storage (S3, DynamoDB)

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️