AWS - kamialie/knowledge_corner GitHub Wiki

Contents

Overview

Architecture

Well Architectured Framework

Well-Architected Tool - framework to evaluate 5 pillars. Provides questions and generates a report based on answers with information on what can be improved or optimized. Can be used to monitor architecture changes over time, set goals, etc.

Operational excellence

The ability to run and monitor systems to deliver business value, and continuously improve processes and procedures.

Design principals:

  • operations as code - infrastructure as code
  • small frequent reversible changes - automate deployments, be able to revert at any time
  • annotated documentation - automated documentation creation
  • refine operations frequently - all team aware of procedures
  • anticipate failure - learn from them

Security

The ability to protect information, systems, and assets through risk assessments and mitigation strategies.

Design principals:

  • strong identity foundation - centralized privilege management, least privilege principal, avoid long-term credentials
  • enable traceability - logs and metrics integration for fast response
  • automate best practices
  • apply encryption in transit and at rest
  • perform data integrity check
  • reduce or limit direct access or manual processing of data

Reliability

The ability to recover from infrastructure or service disruptions, dynamically acquire computational resources to meet demand, and mitigate disruptions such as misconfigurations or network issues.

Design principals:

  • recovery planning and testing - simulate or recreate failures
  • automate recovery
  • dynamic scaling - maintain optimal level to satisfy demand, and distribute work to avoid single point of failure

Performance efficiency

The ability to use compute resources efficiently to meet requirements as demands change and technologies evolve.

Design principals:

  • try out and experiment with new options and advanced technologies
  • go global in minutes
  • strive to use more serverless
  • find best fit for the given workload
  • evaluate trade-offs (f.e. caching)

Cost optimization

The ability to run systems at the lowest possible cost.

Design principals:

  • adopt consumption models - pay only for what is used (strive to use managed services and serverless)
  • measure efficiency
  • analyze and attribute expenditure - identify system components and associated costs (leverage tags)

Migration to the cloud

Cloud Adoption Framework (AWS CAF)

Organizes guidance in 6 areas (perspectives) to focus on for the migration:

  • business
  • people
  • governance
  • platform
  • security
  • operations

Migration strategies

  • rehosting - move applications as is
  • replatforming - moving with few cloud optimizations (not core)
  • refactoring - re-architecturing and developing application with cloud-native features
  • repurchasing - moving from traditional licence to software-as-a-service model (changing existing vendor to cloud-based version)
  • retaining - keeping critical applications (might require major refactoring or can be postponed)
  • retiring - removing applications that are not being used

Pricing and support

Fundamental drivers of cost:

  • compute - hourly from start to termination
  • storage - per GB
  • data transfer - pay for outbound; usually no charge for inbound or between AWS services within the same region (f.e. EC2 and S3)

Free tier services

Pricing Calculator estimates the cost per service, service group or total infrastructure.

Organizations provides Consolidated billing feature (free) that groups multiple accounts billing info into one (by creating one payer account that can view and pay combined bills of all linked accounts). It can also apply bulk discounts and Savings Plans to multiple accounts (f.e. Dedicated instances or total volume used by S3).

Total Cost of Ownership calculator (TCO) creates a report on estimated savings on moving from on-prem to AWS.

Budgets

Budgets can be used to create budget plan of service usage, costs and instance reservations. Can track cost per service, reserved instances, Savings Plan utilizations and coverage. Also provides fully customizable alert system (f.e. if budget has reached certain percentage). Updates 3 times a day.

Cost Explorer

Provides cost analytics (reports, visualizations) over specified period of time, forecasts (up to 12 months), and recommendations (all accessible via API as well). Among many grouping options (f.e. resource, region) can also leverage tags.

Support plans

Support plans:

Plan Cost Support Trusted Advisor Other
Basic free 24/7 customer service limited to account and billing info, documentation, support forums 7 core checks Personal Health Dashboard - alert and remediation guidance when AWS is experiencing events that may affect you
Developer $29/month or 3% of AWS costs Basic, plus email access to customer service, one person is specified as a primary contact (also can ask technical questions) 7 core checks Basic
Business $100/month or 3-10% of AWS costs Developer, plus direct phone, chat access to customer support full aspect of best practices Infrastructure event management (extra fee)
Enterprise $15000/month or 3-10% of AWS costs Business, plus 15-minute SLA for business critical workloads full aspect of best practices dedicated Technical Account Manager (TAM), concierge support team

SLA (Service-level Agreement) specifies response time. Paid support plans (all except Basic) are on month-to-month basis.

Management

Organizations

Management of multiple AWS accounts (global service). Provides consolidated billing across all accounts (single payment method).

Main account is master account (can't be changed), while other accounts are member accounts. Member account can only be part of one organization.

Organizational units (OU) group multiple accounts with similar business or security requirements. OU can include other OUs.

Policies can be attached to individual members or OUs.

Service Control Policies

Service Control Policies, (SCP) allow to put restrictions on AWS services, resources, and individual API calls that users and roles can access. Can be applied to organization root, individual member account or OU, thus, affecting all users, groups and roles within an account (in contrast, IAM policies can not be applied to root account), but does not apply to master account. Explicit DENY on higher level (f.e. OU) can not be overwritten by ALLOW on lower level (f.e. account in OU).

Service Catalog

Managed catalog of IT services to be used within organizations. Serves as an organizational catalog for the cloud. Supports lifecycle for service releases.

Resource Access Manager

Allows sharing resources with other AWS accounts, outside or within an Organization.

Resources (not all):

  • VPC subnets (can not be from default VPC) - accounts can not view, modify or delete resources owned by other accounts on the subnet, but resources can communicate with each other using private IPs and reference security groups
  • Transit Gateway
  • Route53 Resolver Rules
  • Licence Manager Configurations

Control tower

Creates a multi-account environment that follows best practices in operational efficiency, security, and governance.

Centralizes users across all accounts, provides templates that can be used to create new accounts, integrates Guardrails (specific protections are on, f.e. CloudTrail)

Systems manager

Provides operational data and automation across infrastructure (f.e. update system library on predefined set of VMs).

Gives a secure way of accessing servers using AWS credentials. Securely stores commonly used parameters.

OpsWorks

Configuration managements service. Provides managed instances of Chef and Puppet.

Workspaces

Managed Cloud Desktop (Virtual Desktop Infrastructure). Payed on demand. Integrated with Microsoft Active Directory. Supports Linux and Windows.

Limits

Also called Quotas, limits the usage of cloud resources.

API Rate limits set maximum number of API requests a client can make. Every API has its own limits, e.g. S3 GetObject, EC2 DescribeInstances. Clients can either implement exponential backoff stategy (ThrottlingException intermittent errors usually is a sign for it; AWS SDK already has a mechanism for it) or request AWS to increase API throttling limit.

Service Quotas set maximum number of instances on the given service, e.g. maximum number of vCPU for on-demand instances. Client can create a ticket requesting an increase, also possible using Service Quotas API.

DevOps

CodeCommit - managed source code repository. Utilizes git and controls access IAM policies.

CodeBuild - continuous integration service.

CodeDeploy - managed deployment service; works with EC2, Fargate, Lambda and on-prem.

CodePipeline - works with previously mentioned services to create a pipeline (continuous delivery).

CodeStar - bootstrapping; complete continuous delivery toolchain for custom applications.

CLI

Home page, command reference.

CLI is also available in AWS Console as CloudShell (terminal icon to the right of search bar). Not available in all regions. Automatically assumes credentials of the current user - no configuration needed.

Various examples; omitting --profile NAME parameter implies the use of default profile, specify profile if another one should be used:

# create an s3 bucket, mb stands for make bucket
$ aws s3 mb s3://test-bucket

# copy files between S3 and EC2:
$ aws s3 cp s3://test-bucket/file.txt file.txt
$ aws s3 cp file.txt s3://test-bucket/file.txt
$ aws s3 cp s3://test-bucket1/file.txt s3://test-bucket/file.txt

# syncing files between S3 and EC2:
$ aws s3 sync s3://test-bucket1 s3://test-bucket2
$ aws s3 sync . s3://test-bucket/file.txt
$ aws s3 sync s3://test-bucket1/file.txt .

# create an ec2 instance 
$ aws ec2 run-instances --image-id {take from images page} --instance-type t2.micro

# list instances(shows full info in json format):
$ aws ec2 describe-instances

# query specific info from previous command (second line also utilizes filtering):
$ aws ec2 describe-instances --query 'Reservations[].Instances[].PublicIpAddress'
$ aws ec2 describe-instances --query 'Reservations[].Instances[].PublicIpAddress' --filters "Name=platform,Values=windows"

# stop/terminate an instance:
$ aws ec2 stop-instances --instance-ids {id}
$ aws ec2 terminate-instances --instance-ids {id}

Create roles in the same page as IAM->User, then attach to running EC2 or select when creating it; now access tokens are are automatically rotated and can be accessed as this:

$ curl 169.254.169.254/latest/meta-data/iam/security-credentials/
$ curl 169.254.169.254/latest/meta-data/iam/security-credentials/{output from previous command}/

Credentials

Configure credentials (will be then asked Access Key ID and Secret Acess Key, which are obtained from IAM->User->Create User page and can be downloaded as csv); hard-coded way, not preferred (use roles instead).

$ aws configure [--profile NAME]
$ cat ~/.aws/credentials

CLI looks for credentials in this order:

  1. Command line options, e.g --region, --profile
  2. Environment variables, e.g. AWS_ACCESS_KEY_ID
  3. Credentials file - ~/.aws/credentials
  4. Configuration file - ~/.aws/config
  5. Container credentials (ECS tasks)
  6. Instance profile

Pagination

By default CLI uses a page size of 1,000 items, which means one API call to retrieve 1,000 items. If a given command implies retrieving 2,500 items, default CLI behavior would be making 3 separate API calls; results, however, are merged together before being displayed.

  • --page-size <number> - adjust background behavior of API calls to execute the command, and retrieve only # of items per API call. Could help to eliminate time out type of errors due to too many items to retrieve.
  • --max-items <number> - limit number of items to retrieved and displayed; also includes NextToken to be able to retrieve the next set of items. No NextToken attribute in the response indicates there are no more items to retrieve.
  • --strating-token - accepts a token to retrieve next set of items

Other services

Marketplace - software from third-party providers.

Step Functions

Enables orchestration of workflows. Supports serverless architecture. Charges occur on state transition and services leveraged. Workflow is defined using Amazon States Language.

In general Step Functions help to visualize serverless application(s), automate and track triggers for each each step. Usually output of one step is an input of the next, all state changes and actions are logged. Mainly used to orchestrate Lambda functions, but can also be used with EC2, ECS, on-prem servers, API Gateway.

Flow is represented as a JSON state machine. Task is a single step or unit of work within state machine.

Workflows

Step functions provide different types of workflows that apply to different types of tasks that need to be automated/orchestrated.

Standard workflow fits well for long-running, durable workflows that could run up to a year. Full execution history is available up to 90 days after execution. By default tasks are not executed more than once, unless retry is explicitly stated. Works for non-idempotent actions.

Express workflow is designed for short-lived (up to 5 mins), high volume and event-driven types of workflows. Tasks are assumed to potentially run more than once or concurrently, therefore, works well for idempotent actions. Synchronous and asynchronous types are available - first option starts the workflow, waits completion and returns the result, while the latter, starts the workflow and doesn't return anything, results can be later found in logs.

Simple Workflow Service

Coordinates work done by multiple applications running on EC2.

Step Functions is recommended for new applications, but SWS can be used, if there is a need to be able to intervene the process or return values from child processes to parent process.

AppSync

Stores and syncs data across mobile and web apps. Uses GraphQL.

Learning and certification

⚠️ **GitHub.com Fallback** ⚠️