AWS ‐ Stateless Applications | DynamoDB | ElastiCache for MemCached & Redis | Caching Strategies - FullstackCodingGuy/Developer-Fundamentals GitHub Wiki
Stateless Application:
-
Past transactions not remembered
-
Every transaction treated as first time
-
Single service provided - completely independent
-
Credentials stored on local system, server generates unique token that carries user identification information, tokens have expiry time
-
Easy to create and maintain, guarantees uptime, add/remove instances on demand
-
Consistency across application instances, same behavior on all servers, to increase efficiency
Stateful Application:
- Reusable states
- Contextual transactions
- Run on a same server
- User session is stored/tracked/remembered on the same server - preferences and activity
DynamoDB & Stateless Applications
- Its a NoSql database - schemaless, json document
- API Operations
- Control Plane - It exposes API operation allows us to manage table information (update,delete etc)
- Data Plane - CRUD ops
- Streams - data store in a table
- Transactions - supports ACID (atomicity, consistency, isolation, durability), allows rollback operation
DynamoDB Naming Rules
- case sensitive
- table names - 3-255 chars
- there are reserved words
- special chars
Dynamodb data types
- scalar - stores single value
- document - json object
- set
Read Consistency
- Eventual consistent
- Strongly consistent
Read/Write Capacity Modes - controls how much data is allowed
- Ondemand mode - autoscaling mode, charge for your capacity
- Provisioned mode - reserve capacity
Partitions
- Storage on SSD
- Automatically replication accross Availability zones
- No management required, aws takes care of it
Data Distribution - all the data that is stored is indexed by either one of below
- Partition Key
- Partition Key and Sort Key
AWS ElastiCache
- It works with one of 2 technologies - MemCached, Redis
AWS ElasticCache for MemCached and Stateless Apps
- MemCached is a distributed data store
- supports automatic failure detection and recover - because it works in cluster mode, attached with a master node, which detects the failure
Choosing Caching
- Speed and Cost
- choose caching if you want faster data read, to reduce latency
- it is less expensive then maintaining a database, because offloading the some of the read use cases from db to cache
- Data access patterns
Reasons to cache data
- traditional way (read from db) - is expensive and slow data
- frequently accessed data
- is meant for static data, key-value pair
AWS ElasticCache for Redis
ElastiCache Node
- It is smallest building block in a cluster, It is in fixed size, secure network, attached to RAM, it will have a DNS name to be accessible in the network
Node Types
- General Purpose - normal usage
- Compute Optimized - CPU intensive requirements
- Memory optimized - for better memory requirements
Shard
- It is about adding more nodes to store the large datasets, it enables apps to read with low latency, when higher throughput demands.
ElastiCache Clusters
- A Cluster is a collection of one or more cached elasti nodes, each node runs on an instance of cache engine
Setting up
Selecting Node Size
-
By Memory requirements for data
-
Software Version of redis or memcached
-
Application write load (if the need is write heavy then you would need larger compute size and RAM than that of read operation)
-
Selecting node size depends on -- Sharding - Standalone or Multiple Shards -- Local zones - location in the Az-regions
Caching Strategies
- Cache is a High speed data store in ram, it stores a subset of data that are transient in nature (i.e not permanent/transactional data).
- It increases overall performance of the application
- It lowers the database cost by reducing no.of read ops (IO ops) in the main db, especially primary db charges based on throughput
- It reduces the load on backend db, under load, avoids crashing
ElastiCache service makes it easy to deploy, operate the data store on the cloud
Considerations
- Use appropriate design patterns
- Data Availability
- TTL - Expiry of data
Strategy 1 - Read-Through or Lazy Loading - loading the data to cache only when necessary
Key characteristics of Lazy loading
- On-Demand Retrieval: Data is loaded into the cache only when requested, avoiding unnecessary preloading.
- Resource Conservation: Efficient use of memory and bandwidth as only actively used data is loaded.
- Optimized Performance: Reduces initial load times, particularly beneficial for large datasets or non-essential content.
- Dynamic Content Loading: Well-suited for applications with dynamic or unpredictable data access patterns.
- Improved Responsiveness: Enhances user experience by prioritizing the loading of essential data.
- Scalability: Adaptable to changing workloads and scalable as it focuses on what is actively needed.
- Reduced Upfront Costs: Avoids the cost of loading entire datasets into memory at the start.
- Appropriate for Infrequently Accessed Data: Ideal for scenarios where not all data is frequently accessed or required immediately.
Key advnatages of Lazy loading are :
- Resource Efficiency: Conserves resources by loading data into memory only when it is needed.
- Faster Initial Load Times: Reduces the time required for an application or webpage to become initially responsive.
- Improved User Experience: Enhances user experience by prioritizing the loading of essential content first.
- Adaptability to User Behavior: Well-suited for applications with dynamic or unpredictable data access patterns.
- Reduced Bandwidth Usage: Minimizes the amount of data transferred over the network, improving performance.
- Scalability: Adaptable to changing workloads, making it suitable for scalable applications.
Key disadvantages of Lazy loading are :
- Potential Latency:Introduces latency as data is loaded on-demand, impacting real-time responsiveness.
- Complex Implementation: Implementation can be complex, especially for large or intricate systems.
- Increased Code Complexity: May require additional code to manage and handle on-demand loading effectively.
- Challenges in Predicting User Behavior: Requires a good understanding of user behavior to effectively optimize data loading.
- Dependency on Network Conditions: Performance may be affected by slow or unreliable network conditions.
- Potential for Overhead: In certain cases, the overhead of on-demand loading may outweigh the benefits.
-
Advantages
- Read Handler will return the data if exists in cache or expired (cache hit), if not (cache miss) acquire it from source, write in cache and return to handler.
- This way, the only data cached that is requested, it will add a little latency when cache miss but subsequent calls will be faster.
- This strategy allows stale data but it does not fail with empty nodes.
-
Disadvantages
- Cache miss penalty - each cache miss makes 3 calls in the application, cache read -> read from app (as it not exists in cache first time) -> app reads from source and writes to cache and returns to caller
- Stale data - data is written to cache only when cache miss happens, the data is not updated from dependency handler when the source is modified, workaround for this is to use "Write-Through" strategy
Write-Through Strategy - add or update data in cache whenever data in source is updated
- Advantages
- No Stale Data - data in cache is never going to be stale as it is being updated on par with the source
- This strategy ensure the data is fresh but fails with empty node
- Disadvantages
- Write Penalty - every time the data is written to the cache and returned to source, this adds latency to the write operation in the application
- Missing data - when a new node is spun up due to a node failure or scaling up, there could be missing data until it is added/updated in the database
TTL - Time to live
- Read-through strategy allows stale data but it does not fail with empty nodes.
- Write-through strategy ensure the data is fresh but fails with empty node.
Adding TTL can take advantages of both strategies
Read-Through
Async strategy
Write-Behind - It is inverse of write through, it is lazyily/asynchronously updating cache with a delay (may be using a queue)
Refresh-Ahead
More details here
- https://docs.aws.amazon.com/AmazonElastiCache/latest/dg/Strategies.html
- https://dev.to/aws-builders/lazy-loading-vs-write-through-a-guide-to-performance-optimization-28ka