AWS DB Services DynamoDB - devian-al/AWS-Solutions-Architect-Prep GitHub Wiki

DynamoDB Simplified

Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It's a fully managed, multiregion, multimaster, durable non-SQL database. It comes with built-in security, backup and restore, and in-memory caching for internet-scale applications.

DynamoDB Key Details

  • The main components of DynamoDB are
    • a collection which serves as the foundational table
    • a document which is equivalent to a row in a SQL database
    • key-value pairs which are the fields within the document or row
  • The convenience of non-relational DBs is that each row can look entirely different based on your use case. There doesn't need to be uniformity.

For example, if you need a new column for a particular entry you don't also need to ensure that that column exists for the other entries.

  • DynamoDB supports both document and key-value based models.
  • It is a great fit for mobile, web, gaming, ad-tech, IoT, etc.
  • DynamoDB is stored via SSD which is why it is so fast.
  • It is spread across 3 geographically distinct data centers.
  • The default consistency model is Eventually Consistent Reads, but there are also Strongly Consistent Reads.
  • The difference between the two consistency models is the one second rule.
    • With Eventual Consistent Reads, all copies of data are usually reached within one second.
    • A repeated read after a short period of time should return the updated data.
    • However, if you need to read updated data within or less than a second and this needs to be a guarantee, then strongly consistent reads are your best bet.

If you face a scenario that requires the schema, or the structure of your data, to change frequently, then you have to pick a database which provides a non-rigid and flexible way of adding or removing new types of data. This is a classic example of choosing between a relational database and non-relational (NoSQL) database. In this scenario, pick DynamoDB.

  • A relational database system does not scale well for the following reasons
    • It normalizes data and stores it on multiple tables that require multiple queries to write to disk.
    • It generally incurs the performance costs of an ACID-compliant transaction system.
    • It uses expensive joins to reassemble required views of query results.

High cardinality is good for DynamoDB I/O performance. The more distinct your partition key values are, the better. It makes it so that the requests sent will be spread across the partitioned space.

  • DynamoDB makes use of parallel processing to achieve predictable performance. You can visualize each partition or node as an independent DB server of fixed size with each partition or node responsible for a defined block of data. In SQL terminology, this concept is known as sharding but of course DynamoDB is not a SQL-based DB. With DynamoDB, data is stored on Solid State Drives (SSD).

DynamoDB Core Components Summary

  • Partitions

    • Amazon DynamoDB stores data in partitions.
    • A partition is an allocation of storage for a table that is automatically replicated across multiple AZs within an AWS Region.
    • Partition management is handled entirely by DynamoDB — you never have to manage partitions yourself.
    • DynamoDB allocates sufficient partitions to your table so that it can handle your provisioned throughput requirements.
    • DynamoDB allocates additional partitions to a table in the following situations:
      • If you increase the table’s provisioned throughput settings beyond what the existing partitions can support.
      • If an existing partition fills to capacity and more storage space is required.

    Best practices for partition keys:

    • Use high-cardinality attributes – e.g. e-mailid, employee_no, customerid, sessionid, orderid, and so on.
    • Use composite attributes – e.g. customerid+productid+countrycode as the partition key and order_date as the sort key.
    • Cache popular items – use DynamoDB accelerator (DAX) for caching reads.
    • Add random numbers or digits from a predetermined range for write-heavy use cases – e.g. add a random suffix to an invoice number such as INV00023-04593
  • Tables (a collection of items)

  • Items (a collection of attributes)

  • Attributes

    • Scalar Attributes (e.g. Strings, Numbers, Binaries)
    • Nested Attributes (Could be nested up to 32 levels deep)
  • Primary Keys (to uniquely identify each item in a table)

    • Each item in the table must have the primary key attribute(s)
    • Consist of 1 or 2 attribue(s): Partition Key (1 Attribute) Partition Key and Sort Key (2 Attributes)
    • Each primary key attribute must be either string, number, or binary
    • Other Names:
      • Partition Key = Hash Attribute
      • Sort Key = Range Attribute

    Secondary Indexes (to provide more querying flexibility)

    • You can create one or more secondary indexes on a table.
    • A secondary index lets you query the data in the table using an alternate key, in addition to queries against the primary key.
    • Kinds:
      • Global secondary index
        • An index with a partition key and sort key that can be different from those on the table.
        • A GSI is used to speed up queries on non-key attributes
        • Can be created when you create your table or at any time later.
        • You can define up to 20 global secondary indexes
      • Local secondary index
        • An index that has the same partition key as the table, but a different sort key.
        • It gives you a different view of your data, organized by an alternative sort key.
        • Any queries based on this sort key are much faster using the index than the main table.
        • The key benefit of an LSI is that you can query on additional values in the table other than the partition key / sort key.
        • An LSI must be created at table creation time.
        • 5 local secondary indexes per table.

Capacity Unit Consumption

  • CUC for Reads – strongly consistent read request consumes one read capacity unit, while an eventually consistent read request consumes 0.5 of a read capacity unit.
    • GetItem– reads a single item from a table.
    • BatchGetItem – reads up to 100 items, from one or more tables.
    • Query – reads multiple items that have the same partition key value.
    • Scan– reads all of the items in a table
  • CUC for Writes
    • PutItem – writes a single item to a table.
    • UpdateItem – modifies a single item in the table.
    • DeleteItem – removes a single item from a table.
    • BatchWriteItem – writes up to 25 items to one or more tables.

Throughput Management

  • Provisioned throughput – manually defined maximum amount of capacity that an application can consume from a table or index. If your application exceeds your provisioned throughput settings, it is subject to request throttling. Free tier eligible.
  • Reserved capacity – with reserved capacity, you pay a one-time upfront fee and commit to a minimum usage level over a period of time, for cost-saving solutions.
  • Amazon DynamoDB on-demand is a flexible capacity mode for DynamoDB capable of serving thousands of requests per second without capacity planning. When you choose on-demand capacity mode, DynamoDB instantly accommodates your workloads as they ramp up or down to any previously reached traffic level.
    • If a workload’s traffic level hits a new peak, DynamoDB adapts rapidly to accommodate the workload. DynamoDB on-demand offers simple pay-per-request pricing for read and write requests so that you only pay for what you use, making it easy to balance costs and performance.

DynamoDB Auto Scaling

  • When you use the AWS Management Console to create a new table, DynamoDB auto scaling is enabled for that table by default.
  • Uses the AWS Application Auto Scaling service to dynamically adjust provisioned throughput capacity on your behalf, in response to actual traffic patterns.
  • You create a scaling policy for a table or a global secondary index.
  • The scaling policy specifies whether you want to scale read capacity or write capacity (or both), and the minimum and maximum provisioned capacity unit settings for the table or index. The scaling policy also contains a target utilization, which is the percentage of consumed provisioned throughput at a point in time.
  • DynamoDB auto scaling doesn’t prevent you from manually modifying provisioned throughput settings.

If you enable DynamoDB auto scaling for a table that has one or more global secondary indexes, AWS highly recommends that you also apply auto scaling uniformly to those indexes.

DynamoDB Global Tables

  • Global Tables is a multi-region, multi-master replication solution for fast local performance of globally distributed apps.
  • Global Tables replicates your Amazon DynamoDB tables automatically across your choice of AWS regions.
  • It is based on DynamoDB streams and is multi-region redundant for data recovery or high availability purposes. Application failover is as simple as redirecting your application’s DynamoDB calls to another AWS region.
  • Global Tables eliminates the difficult work of replicating data between regions and resolving update conflicts, enabling you to focus on your application’s business logic. You do not need to rewrite your applications to make use of Global Tables.
  • Replication latency with Global Tables is typically under one second.

Monitoring

  • Amazon CloudWatch Alarms – Watch a single metric over a time period that you specify, and perform one or more actions based on the value of the metric relative to a given threshold over a number of time periods.
  • Amazon CloudWatch Logs – Monitor, store, and access your log files from AWS CloudTrail or other sources.
  • Amazon CloudWatch Events – Match events and route them to one or more target functions or streams to make changes, capture state information, and take corrective action.
  • AWS CloudTrail Log Monitoring – Share log files between accounts, monitor CloudTrail log files in real time by sending them to CloudWatch Logs, write log processing applications in Java, and validate that your log files have not changed after delivery by CloudTrail.
    • Using the information collected by CloudTrail, you can determine the request that was made to DynamoDB, the IP address from which the request was made, who made the request, when it was made, and additional details.

DynamoDB Accelerator (DAX)

  • Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache that can reduce Amazon DynamoDB response times from milliseconds to microseconds, even at millions of requests per second.
  • With DAX, your applications remain fast and responsive, even when unprecedented request volumes come your way. There is no tuning required.
  • DAX lets you scale on-demand out to a ten-node cluster, giving you millions of requests per second.
  • DAX does more than just increase read performance by having write through cache. This improves write performance as well.
  • Just like DynamoDB, DAX is fully managed. You no longer need to worry about management tasks such as hardware or software provisioning, setup and configuration, software patching, operating a reliable, distributed cache cluster, or replicating data over multiple instances as you scale.
  • This means there is no need for developers to manage the caching logic. DAX is completely compatible with existing DynamoDB API calls.
  • DAX enables you to provision one DAX cluster for multiple DynamoDB tables, multiple DAX clusters for a single DynamoDB table or somewhere in between giving you maximal flexibility.
  • DAX is designed for HA so in the event of a failure of one AZ, it will fail over to one of its replicas in another AZ. This is also managed automatically.
  • DAX is not recommended if you need strongly consistent reads.
  • DAX is useful for read-intensive workloads, but not write-intensive ones.
  • DAX supports server-side encryption as well as encryption in transit.

Use Cases

  • Applications that require the fastest possible response time for reads.
  • Applications that read a small number of items more frequently than others.
  • For example, limited-time on-sale items in an ecommerce store.
  • Applications that are read-intensive, but are also cost-sensitive. Offload read activity to a DAX cluster and reduce the number of read capacity units that you need to purchase for your DynamoDB tables.
  • Applications that require repeated reads against a large set of data. This will avoid eating up all your DynamoDB resources which are needed by other applications

DynamoDB Streams

  • A DynamoDB stream is an ordered flow of information about changes to items in an Amazon DynamoDB table. When you enable a stream on a table, DynamoDB captures information about every modification to data items in the table.
  • Each stream record also contains the name of the table, the event timestamp, and other metadata.
  • Stream records are organized into groups, or shards
  • Each shard acts as a container for multiple stream records, and contains information required for accessing and iterating through these records.
  • Amazon DynamoDB is integrated with AWS Lambda so that you can create triggers — pieces of code that automatically respond to events in DynamoDB Streams.
  • Immediately after an item in the table is modified, a new record appears in the table's stream.
  • AWS Lambda polls the stream and invokes your Lambda function synchronously when it detects new stream records.
  • The Lambda function can perform any actions you specify, such as sending a notification or initiating a workflow.
  • With triggers, you can build applications that react to data modifications in DynamoDB tables.
  • Whenever an application creates, updates, or deletes items in the table, DynamoDB Streams writes a stream record with the primary key attribute(s) of the items that were modified.
    • A stream record contains information about a data modification to a single item in a DynamoDB table.
    • You can configure the stream so that the stream records capture additional information, such as the "before" and "after" images of modified items.

Access Control

  • All authentication and access control is managed using IAM.
  • DynamoDB supports identity-based policies:
    • Attach a permissions policy to a user or a group in your account.
    • Attach a permissions policy to a role (grant cross-account permissions).
  • Can use a special IAM condition to restrict user access to only their own records.

Other Notes

  • DynamoDB does not support strongly consistent reads across AWS regions
  • When you create a table or index in DynamoDB, you must specify your throughput capacity requirements for read and write activity in terms of:
    • One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size. If you need to read an item that is larger than 4 KB, DynamoDB will need to consume additional read capacity units.
    • One write capacity unit represents one write per second for an item up to 1 KB in size. If you need to write an item that is larger than 1 KB, DynamoDB will need to consume additional write capacity units.
  • Throttling prevents your application from consuming too many capacity units.
    • DynamoDB can throttle read or write requests that exceed the throughput settings for a table, and can also throttle read requests exceeds for an index.

    When a request is throttled, it fails with an HTTP 400 code (Bad Request) and a ProvisionedThroughputExceededException