AWS ‐ Stateless Applications | DynamoDB | ElastiCache for MemCached & Redis | Caching Strategies - FullstackCodingGuy/Developer-Fundamentals GitHub Wiki

Stateless Application:

Past transactions not remembered
Every transaction treated as first time
Single service provided - completely independent
Credentials stored on local system, server generates unique token that carries user identification information, tokens have expiry time
Easy to create and maintain, guarantees uptime, add/remove instances on demand
Consistency across application instances, same behavior on all servers, to increase efficiency

Stateful Application:

Reusable states
Contextual transactions
Run on a same server
User session is stored/tracked/remembered on the same server - preferences and activity

DynamoDB & Stateless Applications

Its a NoSql database - schemaless, json document
API Operations
- Control Plane - It exposes API operation allows us to manage table information (update,delete etc)
- Data Plane - CRUD ops
- Streams - data store in a table
- Transactions - supports ACID (atomicity, consistency, isolation, durability), allows rollback operation

DynamoDB Naming Rules

case sensitive
table names - 3-255 chars
there are reserved words
special chars

Dynamodb data types

scalar - stores single value
document - json object
set

Read Consistency

Eventual consistent
Strongly consistent

Read/Write Capacity Modes - controls how much data is allowed

Ondemand mode - autoscaling mode, charge for your capacity
Provisioned mode - reserve capacity

Partitions

Storage on SSD
Automatically replication accross Availability zones
No management required, aws takes care of it

Data Distribution - all the data that is stored is indexed by either one of below

Partition Key
Partition Key and Sort Key

AWS ElastiCache

It works with one of 2 technologies - MemCached, Redis

AWS ElasticCache for MemCached and Stateless Apps

MemCached is a distributed data store
- supports automatic failure detection and recover - because it works in cluster mode, attached with a master node, which detects the failure

Choosing Caching

Speed and Cost
- choose caching if you want faster data read, to reduce latency
- it is less expensive then maintaining a database, because offloading the some of the read use cases from db to cache
Data access patterns

Reasons to cache data

traditional way (read from db) - is expensive and slow data
frequently accessed data
is meant for static data, key-value pair

AWS ElasticCache for Redis

ElastiCache Node

It is smallest building block in a cluster, It is in fixed size, secure network, attached to RAM, it will have a DNS name to be accessible in the network

Node Types

General Purpose - normal usage
Compute Optimized - CPU intensive requirements
Memory optimized - for better memory requirements

Shard

It is about adding more nodes to store the large datasets, it enables apps to read with low latency, when higher throughput demands.

ElastiCache Clusters

A Cluster is a collection of one or more cached elasti nodes, each node runs on an instance of cache engine

Setting up

Selecting Node Size

By Memory requirements for data
Software Version of redis or memcached
Application write load (if the need is write heavy then you would need larger compute size and RAM than that of read operation)
Selecting node size depends on -- Sharding - Standalone or Multiple Shards -- Local zones - location in the Az-regions

Caching Strategies

Cache is a High speed data store in ram, it stores a subset of data that are transient in nature (i.e not permanent/transactional data).
It increases overall performance of the application
It lowers the database cost by reducing no.of read ops (IO ops) in the main db, especially primary db charges based on throughput
It reduces the load on backend db, under load, avoids crashing

ElastiCache service makes it easy to deploy, operate the data store on the cloud

Considerations

Use appropriate design patterns
Data Availability
- TTL - Expiry of data

Strategy 1 - Read-Through or Lazy Loading - loading the data to cache only when necessary

Key characteristics of Lazy loading

On-Demand Retrieval: Data is loaded into the cache only when requested, avoiding unnecessary preloading.
Resource Conservation: Efficient use of memory and bandwidth as only actively used data is loaded.
Optimized Performance: Reduces initial load times, particularly beneficial for large datasets or non-essential content.
Dynamic Content Loading: Well-suited for applications with dynamic or unpredictable data access patterns.
Improved Responsiveness: Enhances user experience by prioritizing the loading of essential data.
Scalability: Adaptable to changing workloads and scalable as it focuses on what is actively needed.
Reduced Upfront Costs: Avoids the cost of loading entire datasets into memory at the start.
Appropriate for Infrequently Accessed Data: Ideal for scenarios where not all data is frequently accessed or required immediately.

Key advnatages of Lazy loading are :

Resource Efficiency: Conserves resources by loading data into memory only when it is needed.
Faster Initial Load Times: Reduces the time required for an application or webpage to become initially responsive.
Improved User Experience: Enhances user experience by prioritizing the loading of essential content first.
Adaptability to User Behavior: Well-suited for applications with dynamic or unpredictable data access patterns.
Reduced Bandwidth Usage: Minimizes the amount of data transferred over the network, improving performance.
Scalability: Adaptable to changing workloads, making it suitable for scalable applications.

Key disadvantages of Lazy loading are :

Potential Latency:Introduces latency as data is loaded on-demand, impacting real-time responsiveness.
Complex Implementation: Implementation can be complex, especially for large or intricate systems.
Increased Code Complexity: May require additional code to manage and handle on-demand loading effectively.
Challenges in Predicting User Behavior: Requires a good understanding of user behavior to effectively optimize data loading.
Dependency on Network Conditions: Performance may be affected by slow or unreliable network conditions.
Potential for Overhead: In certain cases, the overhead of on-demand loading may outweigh the benefits.

Advantages
- Read Handler will return the data if exists in cache or expired (cache hit), if not (cache miss) acquire it from source, write in cache and return to handler.
- This way, the only data cached that is requested, it will add a little latency when cache miss but subsequent calls will be faster.
- This strategy allows stale data but it does not fail with empty nodes.
Disadvantages
- Cache miss penalty - each cache miss makes 3 calls in the application, cache read -> read from app (as it not exists in cache first time) -> app reads from source and writes to cache and returns to caller
- Stale data - data is written to cache only when cache miss happens, the data is not updated from dependency handler when the source is modified, workaround for this is to use "Write-Through" strategy

Write-Through Strategy - add or update data in cache whenever data in source is updated

Advantages
- No Stale Data - data in cache is never going to be stale as it is being updated on par with the source
- This strategy ensure the data is fresh but fails with empty node
Disadvantages
- Write Penalty - every time the data is written to the cache and returned to source, this adds latency to the write operation in the application
- Missing data - when a new node is spun up due to a node failure or scaling up, there could be missing data until it is added/updated in the database

TTL - Time to live

Read-through strategy allows stale data but it does not fail with empty nodes.
Write-through strategy ensure the data is fresh but fails with empty node.

Adding TTL can take advantages of both strategies

AWS ‐ Stateless Applications | DynamoDB | ElastiCache for MemCached & Redis | Caching Strategies - FullstackCodingGuy/Developer-Fundamentals GitHub Wiki

Stateless Application:

Stateful Application:

DynamoDB & Stateless Applications

DynamoDB Naming Rules

Dynamodb data types

Read Consistency

Read/Write Capacity Modes - controls how much data is allowed

Partitions

Data Distribution - all the data that is stored is indexed by either one of below

AWS ElastiCache

AWS ElasticCache for MemCached and Stateless Apps

Choosing Caching

Reasons to cache data

AWS ElasticCache for Redis

ElastiCache Node

Node Types

Shard

ElastiCache Clusters

Setting up

Selecting Node Size

Caching Strategies

Considerations

Strategy 1 - Read-Through or Lazy Loading - loading the data to cache only when necessary

Write-Through Strategy - add or update data in cache whenever data in source is updated

TTL - Time to live

Read-Through

Async strategy

Write-Behind - It is inverse of write through, it is lazyily/asynchronously updating cache with a delay (may be using a queue)

Refresh-Ahead

More details here

Quiz

Which is used to uniquely identify an entry in a dynamodb table?

Which is an advantage offered by an ElastiCache Cluster?

Which kind of data is best used with a data storage service like Memcached?

Which command is used to connect to a Redis instance in cluster mode?

Which command is used in the AWS CLI to list tables in a dynamodb?

What is meant as a shard in Elasticache?

What is used for efficient sorting and searching of items in a dynamodb table?

Which port is needed to open an incoming connection to an EFS instance?

Which main advantage does caching offer? Ans: Fast Retrieval of data

Which advantages do stateless applications offer over stateful applications?

Which key feature does ElastiCache for Redis offer to deal with data availability?

Which capacity does the elastic file system store?

Which is the default port used in an ElastiCache Cluster? Ans: 6379