Misc Points - vedratna/aws-learning GitHub Wiki

  • NFS -> 2049 (TCP/UDP)
  • DNS -> 53 (UDP/UDP)
  • ELB and Auto scaling can be used to manage scalability across multi AZ in single region.
  • For multi region scalability, Route53 routing policy should be used.
  • Point in time recovery would be capable to recover RDS db till 5 mins before crash using automated backup and transaction logs.
  • RTO - Recovery Time Objective - Acceptable Max Time to recover from disaster. If RTO is 2 hours that means system can afford at max downtime of 2 hours after disaster.
  • RPO - Recovery Point Objective - Acceptable Max Duration of Data loss. If RPO is 1 hour, that means system can afford at max to loose data of 1 hour only in case of disaster, in short system should be able to recover with all the data 1 hour back from the disaster. If system crashed at 4 PM, it should be able to recover minimum with the data it has till 3 PM on the same date.
  • A record is mapping of domain/subdomain to ip address
  • CNAME is mapping of domain/subdomain to another domain/subdomain but it does not support zone apex or naked domain
  • Alias record is specific to Route53, but it supports zone apex or named domain. It is also very useful to map web url to aws internal resources like ELB, S3 and others because it takes care of updating A records of internal resources whenever there is any change in ip address.
  • AWS Backup is one place from where you can automate backups for all storage like EBS, RDS, S3, EFS and others by creating backup plan.
  • Amazon DataSync is used to transfer on-premise data to amazon cloud. Generally it is 10x faster than other service.
  • Snowball import is for offline and more than 10 TB of data transfer while DataSync is used for online data transfer.
  • Storage Gateway is for transfer of data, continuous synchronisation and access of data. It act as a server and transfer data can be accessed hence can be used for regular and routine data operation that we do on premise. While DataSync act as a client and used only for transfer of data. Transferred data can't be accessed through DataSync.
  • In case of multicast communication, the server sends data on a particular multicast IP address and clients who intend to receive that data need to listen on the same multicast address. These clients can be various different networks. A group of clients listening to same multicast address is known as host group.
  • Cross-Zone ELB is nothing but ELB that load balanced traffic among the fleet of EC2 instances deployed across multiple AZ. If CrossZone balancing is not enabled, ELB load balanced traffic between AZs and if enabled, irrelevant of AZs, it get's load balanced among EC2 fleets. For example 1 AZ -> 2 EC2 and 2 AZ -> 3 EC2; without Cross-Zone it would be balanced between 2 zones (2 EC2 on AZ1 gets the half traffic) and with Cross-Zone all 5 EC2 instances get's equal traffic.
  • For legacy and custom (non standard) Http Web server, it is advisable to USE ELB at TCP level than HTTP level
  • Dynamic port mapping with an Application Load Balancer makes it easier to run multiple tasks on the same Amazon ECS service on an Amazon ECS cluster.With the Classic Load Balancer, you must statically map port numbers on a container instance. The Classic Load Balancer does not allow you to run multiple copies of a task on the same instance because the ports conflict. An Application Load Balancer uses dynamic port mapping so that you can run multiple tasks from a single service on the same container instance.
  • ENI is bound to AZ. ENI defined in one AZ can not be associated with the resource in another AZ.
  • VM import/export: AMI created from exported on-premise VM to AWS, can be imported back from AWS to on-premise, but any AMI created directly from AWS resources can not be imported on-premise
  • Replication between primary instance and read replica in RDS is always asynchronous.
  • Creating EBS Volume from Storage Gateway S3 snapshot and attach it to EC2 is faster than creating another Storage Gateway using it and attache it to EC2
  • Elastic Load Balancing provides access logs that capture detailed information about requests sent to your load balancer. Each log contains information such as the time the request was received, the client's IP address, latencies, request paths, and server responses. You can use these access logs to analyze traffic patterns and troubleshoot issues. Access logging is an optional feature of Elastic Load Balancing that is disabled by default. After you enable access logging for your load balancer, Elastic Load Balancing captures the logs and stores them in the Amazon S3 bucket that you specify as compressed files. You can disable access logging at any time.
  • AWS will not allow multicast on its infrastructure, You can work around the limitation by creating a virtual overlay network that runs on the OS level of the instances. And the AWS infrastructure becomes the underlay network.Overlay networking means you create tunnels between the different IP interfaces using VXLANs. The tunnels' end points are the IP addresses (private) on the ENIs of the EC2 instances. This way the OS on each EC2 instances can run IP Multicast over these tunnels, and to AWS, these will appear as IP packets between the IPv4 private IP addresses of the EC2 instances within the VPC. VMware NSX Cloud is an example of a solution that can be used over AWS cloud to establish overlay networking.
  • For read only access to centralized cloud trail logging of multiple accounts; Configure the CloudTrail trail in each AWS account and have the logs delivered to a single AWS bucket in a separate account created specifically for storing logs. Provide the auditor read-only access to this bucket.
  • You can't configure two default routes in a single route table. Also, you can't attach two route tables to a subnet at the same time.
  • MySQL database engine has an embedded replication engine, and the required can be done over a VPN connection from your company’s AWS VPC to the corporate data center.
  • While Archiving small or multiple related files to Glacier; First, compress and then concatenate all files into a single Amazon Glacier archive. Store the associated byte ranges for the compressed files along with other search metadata in an Amazon RDS database with regular snapshotting. When restoring data, query the database for files that match the search criteria, and create restored files from the retrieved byte ranges.
  • Amazon CloudSearch is a fully-managed service in the AWS Cloud that makes it easy to set up, manage, and scale a search solution for your website or application.
  • Amazon CloudSearch provides several benefits over running your own self-managed search service including easy configuration, auto scaling for data and traffic, self-healing clusters, and high availability with Multi-AZ. With a few clicks in the AWS Management Console, you can create a search domain and upload the data you want to make searchable, and Amazon CloudSearch automatically provisions the required resources and deploys a highly tuned search index.
  • A search service and a storage service are complementary. A search service requires that your documents already be stored somewhere, whether it's in files of a file system, data in Amazon S3, or records in an Amazon DynamoDB or Amazon RDS instance. The search service is a rapid retrieval system that makes those items searchable with sub-second latencies through a process called indexing.
  • Point in time recovery from manual snapshot is not possible in RDS. It is possible only with automated backups
  • The AWS Serverless Application Model (SAM) is an open-source framework for building serverless applications. It provides shorthand syntax to express functions, APIs, databases, and event source mappings. With just a few lines per resource, you can define the application you want and model it using YAML. During deployment, SAM transforms and expands the SAM syntax into AWS CloudFormation syntax, enabling you to build serverless applications faster.
  • Multiple EC2 instances from same region, same vpc and in multiple AZ can access an Amazon EFS file system at the same time. Please note that it is VPC specific. All instances should be in same VPC. You can create only one mount target per AZ. Though using VPC peering or Transit Gateway, EFS can be mounted in different VPC and different account
  • Amazon Redshift is fully managed petabyte scale data warehouse in AWS. It provides fast querying over large amount of structured data. It can store large size of data but can't take large ingestion data in real time
  • Amazon Redshift cluster is created in single AZ. However you can restore backup in different AZ as well as you can copy snapshot to different region.
  • Amazon Redshift spectrum can be used to query large amount of s3 data directly. When Redshift cluster need to do compute/query operation on very large amount of s3 data, it forwards query to redshift spectrum, who queries data and returns result back to redshift where final computation happens.
  • Amazon Redshift workload management (WLM) enables users to flexibly manage priorities within workloads so that short, fast-running queries won't get stuck in queues behind long-running queries. Amazon Redshift WLM creates query queues at runtime according to service classes, which define the configuration parameters for various types of queues, including internal system queues and user-accessible queues. From a user perspective, a user-accessible service class and a queue are functionally equivalent.
  • Redshift is primarily used for OLAP scenarios whereas RDS is used for OLTP scenarios
  • EMR Kinesis Connector enables EMR to read and analyze kinesis stream data through simple pig or hive script. Without connector EMR needs some independent stream processing application for the same.
  • Step Functions do not directly support Mechanical Turk. You will need to use Amazon SWF for this scenario.
  • You can set up CloudFront with origin failover for scenarios that require high availability. To get started, you create an origin group with two origins: a primary and a secondary. If the primary origin is unavailable, or returns specific HTTP response status codes that indicate a failure, CloudFront automatically switches to the secondary origin.
  • Amazon now allows you to enable Domain Name System Security Extensions (DNSSEC) signing for all existing and new public hosted zones, and enable DNSSEC validation for Amazon Route 53 Resolver. Amazon Route 53 DNSSEC provides data origin authentication and data integrity verification for DNS and can help customers meet compliance mandates, such as FedRAMP.
  • When you enable DNSSEC signing on a hosted zone, Route 53 cryptographically signs each record in that hosted zone. Route 53 manages the zone-signing key, and you can manage the key-signing key in AWS Key Management Service (AWS KMS). Amazon’s domain name registrar, Route 53 Domains, already supports DNSSEC, and customers can now register domains and host their DNS on Route 53 with DNSSEC signing enabled. When you enable DNSSEC validation on the Route 53 Resolver in your VPC, it ensures that DNS responses have not been tampered with in transit. This can prevent DNS Spoofing.
  • There is a default limit of 50 VPC peering for each VPC.
  • The default limit for shared VPC subnets is 100.
  • To add tags to—or edit or delete tags of—multiple resources at once, use Tag Editor. With Tag Editor, you search for the resources that you want to tag, and then manage tags for the resources in your search results.
  • You can use tags to organize your resources, and cost allocation tags to track your AWS costs on a detailed level. After you activate cost allocation tags, AWS uses the cost allocation tags to organize your resource costs on your cost allocation report, to make it easier for you to categorize and track your AWS costs. AWS provides two types of cost allocation tags, AWS generated tags and user-defined tags. AWS, or AWS Marketplace ISV defines, creates, and applies the AWS generated tags for you, and you define, create, and apply user-defined tags. You must activate both types of tags separately before they can appear in Cost Explorer or on a cost allocation report.
  • After you have created and applied the user-defined tags, you can activate them by using the Billing and Cost Management console for cost allocation tracking. Cost Allocation Tags appear on the console after you've enabled Cost Explorer, Budgets, AWS Cost and Usage Reports, or legacy reports. After you activate the AWS services, they appear on your cost allocation report. You can then use the tags on your cost allocation report to track your AWS costs. Tags are not applied to resources that were created before the tags were created.
  • CloudFront signed URLs and signed cookies provide the same basic functionality: they allow you to control who can access your content. If you want to serve private content through CloudFront and you're trying to decide whether to use signed URLs or signed cookies, consider the following.
  • Use signed URLs for the following cases:
  • You want to use an RTMP distribution. Signed cookies aren't supported for RTMP distributions.
  • You want to restrict access to individual files, for example, an installation download for your application.
  • Your users are using a client (for example, a custom HTTP client) that doesn't support cookies.
  • Use signed cookies for the following cases:
  • You want to provide access to multiple restricted files, for example, all of the files for a video in HLS format or all of the files in the subscribers' area of a website.
  • You don't want to change your current URLs.