AWS ‐ Concepts - FullstackCodingGuy/Developer-Fundamentals GitHub Wiki

Basics

Cloud Concepts

6 Pillars - AWS Well-Architected and the Six Pillars

Operational Excellence

Reference

Design Principle

The following are design principles for operational excellence in the cloud:

  • Perform operations as code: In the cloud, you can apply the same engineering discipline that you use for application code to your entire environment. You can define your entire workload (applications, infrastructure, etc.) as code and update it with code. You can script your operations procedures and automate their process by launching them in response to events. By performing operations as code, you limit human error and create consistent responses to events.

  • Make frequent, small, reversible changes: Design workloads that are scalable and loosely coupled to permit components to be updated regularly. Automated deployment techniques together with smaller, incremental changes reduces the blast radius and allows for faster reversal when failures occur. This increases confidence to deliver beneficial changes to your workload while maintaining quality and adapting quickly to changes in market conditions.

  • Refine operations procedures frequently: As you evolve your workloads, evolve your operations appropriately. As you use operations procedures, look for opportunities to improve them. Hold regular reviews and validate that all procedures are effective and that teams are familiar with them. Where gaps are identified, update procedures accordingly. Communicate procedural updates to all stakeholders and teams. Gamify your operations to share best practices and educate teams.

  • Anticipate failure: Perform “pre-mortem” exercises to identify potential sources of failure so that they can be removed or mitigated. Test your failure scenarios and validate your understanding of their impact. Test your response procedures to ensure they are effective and that teams are familiar with their process. Set up regular game days to test workload and team responses to simulated events.

  • Learn from all operational failures: Drive improvement through lessons learned from all operational events and failures. Share what is learned across teams and through the entire organization.

  • Use managed services: Reduce operational burden by using AWS managed services where possible. Build operational procedures around interactions with those services.

  • Implement observability for actionable insights: Gain a comprehensive understanding of workload behavior, performance, reliability, cost, and health. Establish key performance indicators (KPIs) and leverage observability telemetry to make informed decisions and take prompt action when business outcomes are at risk. Proactively improve performance, reliability, and cost based on actionable observability data.

Best Practices

  • Organization
  • Prepare
  • Operate
  • Evolve
Security

Design Principle

  • Implement a strong identity foundation: Implement the principle of least privilege and enforce separation of duties with appropriate authorization for each interaction with your AWS resources. Centralize identity management, and aim to eliminate reliance on long-term static credentials.

  • Maintain traceability: Monitor, alert, and audit actions and changes to your environment in real time. Integrate log and metric collection with systems to automatically investigate and take action.

  • Apply security at all layers: Apply a defense in depth approach with multiple security controls. Apply to all layers (for example, edge of network, VPC, load balancing, every instance and compute service, operating system, application, and code).

  • Automate security best practices: Automated software-based security mechanisms improve your ability to securely scale more rapidly and cost-effectively. Create secure architectures, including the implementation of controls that are defined and managed as code in version-controlled templates.

  • Protect data in transit and at rest: Classify your data into sensitivity levels and use mechanisms, such as encryption, tokenization, and access control where appropriate.

  • Keep people away from data: Use mechanisms and tools to reduce or eliminate the need for direct access or manual processing of data. This reduces the risk of mishandling or modification and human error when handling sensitive data.

  • Prepare for security events: Prepare for an incident by having incident management and investigation policy and processes that align to your organizational requirements. Run incident response simulations and use tools with automation to increase your speed for detection, investigation, and recovery.

Best Practices

Reliability

Design Principle

There are five design principles for reliability in the cloud:

  • Automatically recover from failure: By monitoring a workload for key performance indicators (KPIs), you can start automation when a threshold is breached. These KPIs should be a measure of business value, not of the technical aspects of the operation of the service. This provides for automatic notification and tracking of failures, and for automated recovery processes that work around or repair the failure. With more sophisticated automation, it’s possible to anticipate and remediate failures before they occur.

  • Test recovery procedures: In an on-premises environment, testing is often conducted to prove that the workload works in a particular scenario. Testing is not typically used to validate recovery strategies. In the cloud, you can test how your workload fails, and you can validate your recovery procedures. You can use automation to simulate different failures or to recreate scenarios that led to failures before. This approach exposes failure pathways that you can test and fix before a real failure scenario occurs, thus reducing risk.

  • Scale horizontally to increase aggregate workload availability: Replace one large resource with multiple small resources to reduce the impact of a single failure on the overall workload. Distribute requests across multiple, smaller resources to verify that they don’t share a common point of failure.

  • Stop guessing capacity: A common cause of failure in on-premises workloads is resource saturation, when the demands placed on a workload exceed the capacity of that workload (this is often the objective of denial of service attacks). In the cloud, you can monitor demand and workload utilization, and automate the addition or removal of resources to maintain the more efficient level to satisfy demand without over- or under-provisioning. There are still limits, but some quotas can be controlled and others can be managed (see Manage Service Quotas and Constraints).

  • Manage change in automation: Changes to your infrastructure should be made using automation. The changes that must be managed include changes to the automation, which then can be tracked and reviewed.

Best Practices

Cost optimization

Design Principle

  • Implement Cloud Financial Management: To achieve financial success and accelerate business value realization in the cloud, invest in Cloud Financial Management and Cost Optimization. Your organization should dedicate time and resources to build capability in this new domain of technology and usage management. Similar to your Security or Operational Excellence capability, you need to build capability through knowledge building, programs, resources, and processes to become a cost-efficient organization.

  • Adopt a consumption model: Pay only for the computing resources that you require and increase or decrease usage depending on business requirements, not by using elaborate forecasting. For example, development and test environments are typically only used for eight hours a day during the work week. You can stop these resources when they are not in use for a potential cost savings of 75% (40 hours versus 168 hours).

  • Measure overall efficiency: Measure the business output of the workload and the costs associated with delivering it. Use this measure to know the gains you make from increasing output and reducing costs.

  • Stop spending money on undifferentiated heavy lifting: AWS does the heavy lifting of data center operations like racking, stacking, and powering servers. It also removes the operational burden of managing operating systems and applications with managed services. This permits you to focus on your customers and business projects rather than on IT infrastructure.

  • Analyze and attribute expenditure: The cloud makes it simple to accurately identify the usage and cost of systems, which then permits transparent attribution of IT costs to individual workload owners. This helps measure return on investment (ROI) and gives workload owners an opportunity to optimize their resources and reduce costs.

Best Practices

Performance efficiency

Design Principle

  • Democratize advanced technologies: Make advanced technology implementation smoother for your team by delegating complex tasks to your cloud vendor. Rather than asking your IT team to learn about hosting and running a new technology, consider consuming the technology as a service. For example, NoSQL databases, media transcoding, and machine learning are all technologies that require specialized expertise. In the cloud, these technologies become services that your team can consume, permitting your team to focus on product development rather than resource provisioning and management.

  • Go global in minutes: Deploying your workload in multiple AWS Regions around the world permits you to provide lower latency and a better experience for your customers at minimal cost.

  • Use serverless architectures: Serverless architectures remove the need for you to run and maintain physical servers for traditional compute activities. For example, serverless storage services can act as static websites (removing the need for web servers) and event services can host code. This removes the operational burden of managing physical servers, and can lower transactional costs because managed services operate at cloud scale.

  • Experiment more often: With virtual and automatable resources, you can quickly carry out comparative testing using different types of instances, storage, or configurations.

  • Consider mechanical sympathy: Understand how cloud services are consumed and always use the technology approach that aligns with your workload goals. For example, consider data access patterns when you select database or storage approaches.

Best Practices

Sustainability

Design Principle

  • Understand your impact: Measure the impact of your cloud workload and model the future impact of your workload. Include all sources of impact, including impacts resulting from customer use of your products, and impacts resulting from their eventual decommissioning and retirement. Compare the productive output with the total impact of your cloud workloads by reviewing the resources and emissions required per unit of work. Use this data to establish key performance indicators (KPIs), evaluate ways to improve productivity while reducing impact, and estimate the impact of proposed changes over time.

  • Establish sustainability goals: For each cloud workload, establish long-term sustainability goals such as reducing the compute and storage resources required per transaction. Model the return on investment of sustainability improvements for existing workloads, and give owners the resources they must invest in sustainability goals. Plan for growth, and architect your workloads so that growth results in reduced impact intensity measured against an appropriate unit, such as per user or per transaction. Goals help you support the wider sustainability goals of your business or organization, identify regressions, and prioritize areas of potential improvement.

  • Maximize utilization: Right-size workloads and implement efficient design to verify high utilization and maximize the energy efficiency of the underlying hardware. Two hosts running at 30% utilization are less efficient than one host running at 60% due to baseline power consumption per host. At the same time, reduce or minimize idle resources, processing, and storage to reduce the total energy required to power your workload.

  • Anticipate and adopt new, more efficient hardware and software offerings: Support the upstream improvements your partners and suppliers make to help you reduce the impact of your cloud workloads. Continually monitor and evaluate new, more efficient hardware and software offerings. Design for flexibility to permit the rapid adoption of new efficient technologies.

  • Use managed services: Sharing services across a broad customer base helps maximize resource utilization, which reduces the amount of infrastructure needed to support cloud workloads. For example, customers can share the impact of common data center components like power and networking by migrating workloads to the AWS Cloud and adopting managed services, such as AWS Fargate for serverless containers, where AWS operates at scale and is responsible for their efficient operation. Use managed services that can help minimize your impact, such as automatically moving infrequently accessed data to cold storage with Amazon S3 Lifecycle configurations or Amazon EC2 Auto Scaling to adjust capacity to meet demand.

  • Reduce the downstream impact of your cloud workloads: Reduce the amount of energy or resources required to use your services. Reduce the need for customers to upgrade their devices to use your services. Test using device farms to understand expected impact and test with customers to understand the actual impact from using your services.

Best Practices

Additional References, Whitepapers


IaaS - Infrastructure as a service

image

PaaS - Platform as a service

image

SaaS - Software as a service

image

Benefits

  • Cloud provides greater flexibility in managing resources and cost
  • Minimum upfront investments as customer does not have to purchase any physical infrastructure
  • Provides Just in time infrastructure
  • No long term contracts or commitments
  • Rich Automation - Infra becomes scritable using APIs and shell
  • Automatic scaling based on the load, Scaled out - adding more resources of same size, Scale in - removing the resources, scale up - increasing the size of the resources, scale down - decreasing the size of the resource
  • Increased ability of software development lifecycle
  • Benefits of HA (High availability) and disaster recovery

Elasticity (i.e Flexibility)

  • Cloud provides scalable architecture - cloud provides infra that has ability expand and contract depends on the load

  • Cloud infra can easily horizontally or vertically

  • Provides infinite scalability

  • Horizontal scaling - Scale up (increasing no.of web servers or nodes), Scale down (decreasing no.of web servers or nodes)

  • Vertical scaling - Scale out (increasing the processing capacity/memory/resources of server), Scale in (decreasing the processing capacity/memory/resources of server)

Scaling in manual vs automatic approach

image

image

Constraints

  • Cloud has many building blocks to construct a system
  • Cloud may not have exact services or components or software in place as similar to the non-cloud infra, application architecture has to support the cloud native solutions in order to maximize the cloud benefits

Cloud Practices

Design for failure

  • Thinking about failure while designing the product, later the product becomes fail proof
  • Avoid Single point of failure (Ex: hosting web app and db on same instance)
  • To mitigate the single point of failure, use Load balancer environment

In this scenario also, having multiple web servers connecting with single database server causing the single point of failure.. image

to mitigate this, use amazon RDS database instances along with elastic load balancer where scaling is automatically included along with redundancy to avoid SPF

image

Redundancy

Leverage redundancy in terms of software/web servers/db nodes/network resources to avoid the single point of failure.

Implement elasticity

Ability of cloud to scale resources to match the demand

2 ways of scaling

  • Scaling at fixed time interval

image

  • Scaling on demand based on certain metrics, when metrics reached certain threshold; to supply the resources to fulfil the demand

image

Decouple your components

A design principle to minimize the dependencies between components in order to improve the scalability of applications

Decoupling or loose coupling refers to a design principle concerned with minimizing dependencies between components in order to improve the scalability of applications.

  • Loose coupling enables the applications to scale independently

Optimize for performance

Its about decreasing the latency and increasing throughput. Also talks about how important it is to utilize the cloud resources efficiently.

  • Get to know about all the services, select appropriate services depends on the use cases maximize the efficiency and performance.

Keep things secure

image

image

Optimize for cost

Keeping Things Secure

Shared security Model

image

Security responsibility shared between customer and amazon to work together Amazon is responsible for the security OF the cloud infrastructure, physical infra, network infra Customer is responsible for the security IN the cloud such as account, user management

For IaaS

image

For PaaS

image

For SaaS

image

IAM: Master account

image

image

Creating AWS Account

IAM: Groups, Roles and Permissions

New User, Key pair, Security Groups

image

VPC (Virtual Private Cloud) aka Private Network

image

EC2 Classic (old)

image

Latest EC2

image

image

Region, Availability Zone

image

Elastic Block Storage

File system for data storage

image

  • Snapshots are stored into s3 incrementally, and snapshots are used to restore the data in new regions/availability zones

image

RDS - Relational Database Service

image

image

Elasticity - Implementation and automate infrastructure

Autoscaling

Issues in manual scaling:

image

automatic scaling:

image

image

Autoscaling depends on 3 main components

1. Launch configuration (what to launch) - specifies about AMI, what ec2 configuration, security group, storage etc

2. Auto Scaling Group (where to launch) - define limits, how many instances/capacity to launch

3. Scaling Policy (when to launch) - defines threshold to launch the instances, to define monitoring threshold

image

Cloudwatch

It allows the user to monitor the resource utilization, performance, network traffic, load, set alarm notifications

image

image

Beanstalk

It refers to automatically provisioning resources It takes care of the capacity planning, load balancing, autoscaling, application health monitoring

image

OpsWorks

Its a provisioning engine to automate the infrastructure needs, the difference from the beanstalk being that, user can perform more granular level configuration in opsworks.

image

image

image

CloudFormation

Scripted way of automating the deployment, using template file in json format specifications about the components/resources needed.

  • Usecase such as replicating dev env to qa or staging etc

image

image

image

CloudFormer

It is used to help configure and launch the required resources from the existing stack.

image

image

CodeDeploy

It is a Component service, it coordinates deployments to ec2 instances

image

image

image

Optimize for Performance

ElastiCache

image

Typical setup, but not scalable

image

cache with each app server instance also not idea solution image

Elasticache Supports two types of cachine.

  1. MemCached

image

  1. Redis

image

Caching Strategies or Patterns

  1. Write Through Pattern

image

Pros

It increases the data hit as all the data being kept in the cache, Data being updated in cache irrespective of the demand

Cons

Increases more storage as all the data kept in memory

  1. Lazy Load

image

Props

Keeping only needed data in the memory, so less memory requirement

Cons

Higher data miss rate, hence causing lower performance

image

CloudFront - CDN

CloudFront stores the resources cached locally as close to the users, when the requests comes it is routed to the least latent network edge to get the resources from the regional locations.

image

image

image

image

image

image


Serverless

Storage Options

image

image

Object Storage (S3)

Objects available in s3 are highly available & durable, follows "Eventual Consistency" Model Whenever there is change in the object, there is a latency in propagating the changes to all the replicas. This causing the storage to return the objects even after delete request made.

So it is best suitable for objects that does not change much such as Archieves, videos, images

image

Max object size is 5TB, unlimited on no.of objects being stored. Objects can be accessed via rest api.

Glacier

Extension of S3 - for data that are retrieved infrequently

image

Data is transitioned from s3 to glacier when ready for archived

File Storage

image

RRS - Reduced Redundancy Storage

image

Example: store the videos and high quality images in s3 and store the thumbnail in the RRS

Optimizing for performance serverless architectures

S3

API Gateway - a managed service, its a layer between the app and users

  1. Versioning
  2. Caching
  3. Throttling
  4. Scaling
  5. Security
  6. Authentication & authorization
  7. Monitoring

image

image

AWS Lambda

  • Functions as the unit of scale
  • Abstracts the runtime

2 Main Components of Lambda

  • The function
    • some custom code/script that performs business logic
  • Event Sources
    • a trigger to execute the function, ex: trigger the function when an object is added into s3 bucket (when any bucket event is occurred)

image

When to & When not to use the Lambda functions?

study & expand


DynamoDB - NoSQL, Schema-less, scalable database service with low latency, high performance, high throughput

  • data stored in ssd
  • data automatically replicated across multiple Availability zones

SQS - Simple Message Queue Service

It is a reliable, durable, highly scalable distributed system for passing messages between components.

Used to build loosely coupled systems (minimizing the dependencies).

SWF - Simple Workflow service

To configure and coordinate the tasks in the given workflow.

Example: Tightly Coupled E-Commerce System

SNS - Simple Notification Service

image

  • Push notifications rather than pull
  • Posting to a topic causes a message to send immediately
  • SNS lets us push notifications where as SQS requires applications to poll constantly (pull approach)
  • modes Email, Text Message

image

image

image

image

image

Cost Optimization

  • Simple monthly calculator tool - to anaylze the services, usage to provide the cost metrics
  • Get detailed billing reports by account/services/tags monthly, daily hourly
  • Cost Explorer - ui to get interactive reports
  • Billing Alarms - using CloudWatch and SNS to get the billing notifications whenever threshold reach
  • Create Budgets

Matching supply with demand

image


General Topics

Essential Cloud Components

  • Load Balancers
  • EC2 Instances for the application/api deployments
  • S3 Instances for the storage needs
  • Lambda for serverless computing
  • DynamoDB for nosql database requirements
  • RDS for database needs
  • CloudWatch for monitoring resources and alert systems
  • CloudFront for CDN solutions

Fault Tolerance

Fault tolerance refers to the ability of a system (computer, network, cloud cluster, etc.) to continue operating without interruption when one or more of its components fail.

Its about how the system is able to withstand the load when one or more of its components fail.

The objective of creating a fault-tolerant system is to prevent disruptions arising from a single point of failure, ensuring the high availability and business continuity of mission-critical applications or systems.

High Availability

Its about avoiding loss of service by ensuring enough number of resources available to serve the load.

High availability refers to a system’s ability to avoid loss of service by minimizing downtime. It’s expressed in terms of a system’s uptime, as a percentage of total running time. Five nines, or 99.999% uptime, is considered the “holy grail” of availability.


Frequently asked questions topics

FAQs

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

image

RDS

image

image

image

image

image

image

image

image

image

image

Autoscaling/Automating infrastructuring

image

image

image

image

image

image

image

Storage (S3, DynamoDB)

image

image

image

image

image

  • AWS Well-Architected helps cloud architects build secure, high-performing, resilient, and efficient infrastructure for a variety of applications and workloads.
⚠️ **GitHub.com Fallback** ⚠️