DynamoDB - seanremenyi/Notes_aws_developer GitHub Wiki

Fast and flexible NoSQL Databse (no need to define a schema upfront) Consistent, single-digit millisecond latency at any scale Fully Managed. Supports key-value data models. Supports document formats are JSON, HTML and XML Use Cases: A great fit for mobile, web, gaming, ad tech, IoT and many other applications

Serverless Integrates well with Lambda Dynaodb can be configured to automatically scale. A popular choice for developers and architects who are designing serverless applications. Perrformance : SSD storage Resilience: Spred across 3 geographically distinct data centers Consistency: Eventual consistent reads (default), strongly consisten reads Eventually Consistent Reads: Consisteny across all copies of data is usually reached within a second. Best for read performance. Strongly Consistent Reads: A strongly consistent read always reflects all successful writes. Writes are reflected across all 3 locations at once. Best for read consistency. ACID Transactions: DynamoDB Transactions provide the ability to perform ACID transactions (Atomic, Consistent, Isolated, Durable). Read or write multiple items across multiple tables as an all or nothing operation

Primary Keys DynamoDB stores and retrieves data based on a primary key Two types: partition key, composite key (partition key + sort key)

Partition key based on a unique attribute (like a customer id, ,product id, email address, etc.) value of the partition key is input to an internal hash function which determines the partition or physical location on which the data is stored. If you are using the partition key as your primary key, then no 2 items can have the same partition key Composite key Partition key +Sort key if partition key is not unique. (ex forum posts, users post multiple messages. -> Combination of User_id and sort key(timestamp)) a unique compbination: Items in the table may have the same partition key, but they must have a different sort key. Storage: All items with the same partition key are stored together and then sorted according to the sort key value

Access Control IAM: authentication and access control is managed using AWS IAM IAM Permissions: you can create IAM users within your AWS account with specific permissions to access and create dynamoDB tables IAM Roles: You can also create IAM roles, enabling temporary access to DynamoDB

Restricting User Access You can also use a spcial IAM condition to restrict user access to only their own records This can be done by adding a condition to an IAM Policy to allow access only to items where the partition key value matches their User_ID the condition in the policy has "dynamodb:LeadingKeys" allows users to access only the items where the partition key value matches their user ID fine grained access control with IAM

Secondary Indexes Flexible querying: Query based on an attribute that is not the primary key Dynamodb allows you to run a query on non-primary key attributes using global secondary indexes and local secondary indexes A secondary index allows you to perform fast queries on specific columns in a table. You select the columns that you want included in the index and run your searches on the index, rather than on the entire dataset

Local Secondary index Primary Key: Same partition key as your orignial table but a different sort key A different view: Gives you a different view of your data, organized according to an alternative sort key Faster queries: Any queries based on this sort key are uch faster using the index than the main table Add at creation time: Can only be created when you are creating your table. Tou cannot add, remove or modify it later ex, user_id (same partition key) sort key:country

Global secondary index: A completely different Primary Key: Different partition key and sort ky View your data differently: Gives you a completely different view of the data Speeds up queries: Speeds up queries relating to this alternative partition and sort key Flexible: You can create when you create your table or add it later ex email address (different partition key) sort key: last login There is an initial quota of 20 global secondary indexes per table. To request a service quota increase

query: A query operation finds items in a table based on the primary key attribute and a distinct value to search for. For example, selecting an item where the user ID is equal to 212 will select all the attributes for that item (e.g., first name, surname, email address) Refine Queries Use an optional sort key name and value to refine the results. For example, if your sort key is a timestamp, you can refine the query to only select items with a timestamp of the last 7 days By default, a query returns all the ttributes for the items you select, but you can use the ProjectionExpression parameter if you want to only return the specific attributes you want (e.g. if you only want to see the email address rather than all the attributes) Sort Key: Results are always sorted by the sort key Numeric order: By defaul in ascending numeric order (1,2,3,4,5) ASCII: ASCII character code values Reverse the Order: You can reverse the order by seting the ScanIndexForward parameter to False (only works for queries despite the name) Eventually Consistent: By Default, queries are eventually consistent. Strongly Consistent: You need to explicitly set the qery to be strongly consistent

scan: A scan operation examined every item in the table. By default, it returns all data attributes Use the ProjectionExpression parameter to refine the scan to only return attributes you want. (e.g., if you only want to see the email address rather than all the attributes) Can use filters but with a scan it still goes over the whole table, dumping the data then refines the scan based on the filter. Sequential by default: A scan operation processes dat sequentially, returning 1MB increments before moving on to retrieve the next 1 MB of data. Scans one partition at a time Parallel is possible: You can configure DynaamoDB to use parrallel scans instead by logically dividing a table or index into segments and scanning each segment in parallel. Beware. It is best to avoid parallel scans if your table or index is already incurring heavy read or write activity from other applications. Isolate scan operations to specific tables and segregate them from your mission-critical traffice. Even if that means writing data to 2 different tables

Query or Scan:

Query is more efficient than a scan: A scan dumps the entire table and filters out the values to provide the desired result, removing the unwanted data Extra Step: Adds an extra step of removing the data you don't want. As the table gorws, the scan operation takes longer Provisioned Throughput: A scan operation on a large table can use up the provisioned throughput for a large table in just a single operation Improving Performance. Set smaller page size (E.g. set the page size to return 40 items). Running a large number of smaller operations will allow other requests to succeed without throttling. But in general avoid scans. Avoid using scan operations if you can. Design tables in a way that you can use the Query, Get or BatchGetItem APIs. When using Query, or Scan, DynamoDB returns all of the item attributes by default. To get just some, rather than all of the attributes, use a Projection Expression. A Scan operation in Amazon DynamoDB reads every item in a table or a secondary index. By default, a Scan operation returns all of the data attributes for every item in the table or index. You can use the ProjectionExpression parameter so that Scan only returns some of the attributes, rather than all of them

DynamoDB provisioned Throughput Measured in Capacity Units Specify Requirements: Whenyou create your table, you can specify your requirement in terms of read capacity units and write capacity units Write capacity units: 1x write capacity unit = 1 x 1KB write per second Read capacity units: 1 read capacity unit = 1 x strongly consistent of 4 KB per second or 2x everntually consistent reads of 4 KB per seond (default) If your application reads or writes larger items, it will consume more capacity units and cost you more as well

On-Demand Capacity Charges apply ro reading, writing and storing data Dynamodb instantly scales up and down based on the activity of your application (don't need to specify read/write at creation time) Great for: unpredictable workloads New applications where you don't know the use pattern yet When you pay for only what you use (pay per request)

When to use each pricing model? On-demand: Unknown workloads Unpredictabl application traffic Spiky, short-lived peaks, A pay-per-use model is desire It might be more difficult to predict the cost. Provisioned Capacity: Reand and write capacity requirements can be forecasted Predictable application traffic Application traffic is consistent or increases gradually You have more control over the cost

DynamoDb Accelerator (DAX) it is a fully managed, custered in-memory cache for DynamoDB Delivers up to 10x read performance imporvement. Microsecond performance for millions of requests per second Ideal for read-heavy and bursty workloads like auction applications, gaming, and retail sites during black friday sales Dax is a write-through caching service. Data is written to the cache and the backend stoer (the Dynamodb table) at the same time This allows you to point your DynamoB API calls at the DAX cluster. If the item you are querying is in the cache (cache hit), DAX returns the result If the item is not available (cache miss), then DAX performs an eventually consistent GetItem operation against DynamoDB and returns the result of the API call. Reduces the read load on DynamoDB tables. May be able to reduce provisioned read capacity on your table and save money on your AWS bill What is it not suitable for? DAX Improves response times for eventual consistent reads onlny. Not suitable for applications that require strongly consistent reads. applications which are mainly write-intensive Applications that do not perform many read operations Applications that do not require microsecond response times

DynamoDB TTL (time to live) Defines an expiry time for your data Expired items marked for deletiong (will be deleted within 48 hours) Great for removing irrelevant or old data (e,g session data, event logs and temporary data Reduces the cost of your table by automatically removing data which is no longer relevant expressed in UNIx/epoch time (numerical value of s since 1970) When the current time is greater than the TTL, the item will be expired and marked for deletion You can filter out expired items from your queries

DynmoDB Streams: Time ordered sequence of item level modifications (e.g. insert, update, delete) Logs encrypted at rest and stored for 24 hours Can use to rigger lambda function Dedicated endpoint Primary key by default is recorded Before and after images can also be stored Use Cases: Audit or archive transactions, trigger an event based on a particular transaction, or replicate data across multiple tables Applications can take actions based on contents of the stream A dynamoDB stream can be an event source for lambda Lambda polls the DynamoDB stream and executes based on an event Summary:

Sequence of modifications: DynamoDB streams is a time-ordered sequence of item level modifications in your DynamoDB tables Encrypted and stored: Data is tored for 24 hours only Lambda Event Source: Can be used as an event source for Lambda, so you can create applications that take ctions based on events in your DynamoDB table Provisioned throughput and Exponential Backoff ProvisionedThroughputExceededException: Your request rate is too high for the read/write capactiy provisioned on your DynamoDB table Using the AWS SDK: The SDK will automatically retry the requests until successful Not using the AWS SDK: Reduce your request frequecy. Use exponential Backoff

In addition to simple retries, all AWS SDKs use exponential Backoff Uses progressively longer waits between consecutive retries, for improved flow control ex< fail, wait 50ms, retry, if fail maybe wait 100ms, retry if fail again maybe 200ms retry.

If after 1 minute this doesnt work, your request size may be exceeding the throughput for your read/write capacity