AWS ‐ Stateless Applications | DynamoDB | ElastiCache for MemCached & Redis | Caching Strategies - FullstackCodingGuy/Developer-Fundamentals GitHub Wiki

image

Stateless Application:

  • Past transactions not remembered

  • Every transaction treated as first time

  • Single service provided - completely independent

  • Credentials stored on local system, server generates unique token that carries user identification information, tokens have expiry time

  • Easy to create and maintain, guarantees uptime, add/remove instances on demand

  • Consistency across application instances, same behavior on all servers, to increase efficiency

Stateful Application:

  • Reusable states
  • Contextual transactions
  • Run on a same server
  • User session is stored/tracked/remembered on the same server - preferences and activity

DynamoDB & Stateless Applications

  • Its a NoSql database - schemaless, json document
  • API Operations
    • Control Plane - It exposes API operation allows us to manage table information (update,delete etc)
    • Data Plane - CRUD ops
    • Streams - data store in a table
    • Transactions - supports ACID (atomicity, consistency, isolation, durability), allows rollback operation

DynamoDB Naming Rules

  • case sensitive
  • table names - 3-255 chars
  • there are reserved words
  • special chars

Dynamodb data types

  • scalar - stores single value
  • document - json object
  • set

Read Consistency

  • Eventual consistent
  • Strongly consistent

Read/Write Capacity Modes - controls how much data is allowed

  • Ondemand mode - autoscaling mode, charge for your capacity
  • Provisioned mode - reserve capacity

Partitions

  • Storage on SSD
  • Automatically replication accross Availability zones
  • No management required, aws takes care of it

Data Distribution - all the data that is stored is indexed by either one of below

  • Partition Key
  • Partition Key and Sort Key

AWS ElastiCache

  • It works with one of 2 technologies - MemCached, Redis

AWS ElasticCache for MemCached and Stateless Apps

  • MemCached is a distributed data store
    • supports automatic failure detection and recover - because it works in cluster mode, attached with a master node, which detects the failure

Choosing Caching

  • Speed and Cost
    • choose caching if you want faster data read, to reduce latency
    • it is less expensive then maintaining a database, because offloading the some of the read use cases from db to cache
  • Data access patterns

Reasons to cache data

  • traditional way (read from db) - is expensive and slow data
  • frequently accessed data
  • is meant for static data, key-value pair

image

AWS ElasticCache for Redis

image image

image

image

ElastiCache Node

  • It is smallest building block in a cluster, It is in fixed size, secure network, attached to RAM, it will have a DNS name to be accessible in the network

image

Node Types

  • General Purpose - normal usage
  • Compute Optimized - CPU intensive requirements
  • Memory optimized - for better memory requirements

image

Shard

  • It is about adding more nodes to store the large datasets, it enables apps to read with low latency, when higher throughput demands.

ElastiCache Clusters

  • A Cluster is a collection of one or more cached elasti nodes, each node runs on an instance of cache engine image

Setting up

Selecting Node Size

  • By Memory requirements for data

  • Software Version of redis or memcached

  • Application write load (if the need is write heavy then you would need larger compute size and RAM than that of read operation)

  • Selecting node size depends on -- Sharding - Standalone or Multiple Shards -- Local zones - location in the Az-regions

Caching Strategies

  • Cache is a High speed data store in ram, it stores a subset of data that are transient in nature (i.e not permanent/transactional data).
  • It increases overall performance of the application
  • It lowers the database cost by reducing no.of read ops (IO ops) in the main db, especially primary db charges based on throughput
  • It reduces the load on backend db, under load, avoids crashing

ElastiCache service makes it easy to deploy, operate the data store on the cloud

Considerations

  • Use appropriate design patterns
  • Data Availability
    • TTL - Expiry of data

Strategy 1 - Read-Through or Lazy Loading - loading the data to cache only when necessary

Key characteristics of Lazy loading

  • On-Demand Retrieval: Data is loaded into the cache only when requested, avoiding unnecessary preloading.
  • Resource Conservation: Efficient use of memory and bandwidth as only actively used data is loaded.
  • Optimized Performance: Reduces initial load times, particularly beneficial for large datasets or non-essential content.
  • Dynamic Content Loading: Well-suited for applications with dynamic or unpredictable data access patterns.
  • Improved Responsiveness: Enhances user experience by prioritizing the loading of essential data.
  • Scalability: Adaptable to changing workloads and scalable as it focuses on what is actively needed.
  • Reduced Upfront Costs: Avoids the cost of loading entire datasets into memory at the start.
  • Appropriate for Infrequently Accessed Data: Ideal for scenarios where not all data is frequently accessed or required immediately.

Key advnatages of Lazy loading are :

  • Resource Efficiency: Conserves resources by loading data into memory only when it is needed.
  • Faster Initial Load Times: Reduces the time required for an application or webpage to become initially responsive.
  • Improved User Experience: Enhances user experience by prioritizing the loading of essential content first.
  • Adaptability to User Behavior: Well-suited for applications with dynamic or unpredictable data access patterns.
  • Reduced Bandwidth Usage: Minimizes the amount of data transferred over the network, improving performance.
  • Scalability: Adaptable to changing workloads, making it suitable for scalable applications.

Key disadvantages of Lazy loading are :

  • Potential Latency:Introduces latency as data is loaded on-demand, impacting real-time responsiveness.
  • Complex Implementation: Implementation can be complex, especially for large or intricate systems.
  • Increased Code Complexity: May require additional code to manage and handle on-demand loading effectively.
  • Challenges in Predicting User Behavior: Requires a good understanding of user behavior to effectively optimize data loading.
  • Dependency on Network Conditions: Performance may be affected by slow or unreliable network conditions.
  • Potential for Overhead: In certain cases, the overhead of on-demand loading may outweigh the benefits.
  • Advantages

    • Read Handler will return the data if exists in cache or expired (cache hit), if not (cache miss) acquire it from source, write in cache and return to handler.
    • This way, the only data cached that is requested, it will add a little latency when cache miss but subsequent calls will be faster.
    • This strategy allows stale data but it does not fail with empty nodes.
  • Disadvantages

    • Cache miss penalty - each cache miss makes 3 calls in the application, cache read -> read from app (as it not exists in cache first time) -> app reads from source and writes to cache and returns to caller
    • Stale data - data is written to cache only when cache miss happens, the data is not updated from dependency handler when the source is modified, workaround for this is to use "Write-Through" strategy

Write-Through Strategy - add or update data in cache whenever data in source is updated

  • Advantages
    • No Stale Data - data in cache is never going to be stale as it is being updated on par with the source
    • This strategy ensure the data is fresh but fails with empty node
  • Disadvantages
    • Write Penalty - every time the data is written to the cache and returned to source, this adds latency to the write operation in the application
    • Missing data - when a new node is spun up due to a node failure or scaling up, there could be missing data until it is added/updated in the database

TTL - Time to live

  • Read-through strategy allows stale data but it does not fail with empty nodes.
  • Write-through strategy ensure the data is fresh but fails with empty node.

Adding TTL can take advantages of both strategies

Read-Through

Async strategy

Write-Behind - It is inverse of write through, it is lazyily/asynchronously updating cache with a delay (may be using a queue)

Refresh-Ahead

More details here

Quiz

Which is used to uniquely identify an entry in a dynamodb table?

image

Which is an advantage offered by an ElastiCache Cluster?

image

Which kind of data is best used with a data storage service like Memcached?

image

Which command is used to connect to a Redis instance in cluster mode?

image

Which command is used in the AWS CLI to list tables in a dynamodb?

image

What is meant as a shard in Elasticache?

image

What is used for efficient sorting and searching of items in a dynamodb table?

image

Which port is needed to open an incoming connection to an EFS instance?

image

Which main advantage does caching offer? Ans: Fast Retrieval of data

Which advantages do stateless applications offer over stateful applications?

image

Which key feature does ElastiCache for Redis offer to deal with data availability?

image

Which capacity does the elastic file system store?

image

Which is the default port used in an ElastiCache Cluster? Ans: 6379