Distributed Cache - barialim/architecture GitHub Wiki

Table of Content

Overview

Caching is a mechanism to store frequently accessed data in a data store temporarily and retrieve this data for subsequent requests instead of extracting it from the original data source. ⭐

This process improves the performance and availability of an application.

Reading data from the database may be slower if it needs to execute complex queries. ⭐

What is distributed cache?

When caches stored in the local server, they are often stored in hardware such as RAM (Random-access memory) to take advantage of faster write/access time. It is entirely possible to store and retrieve disk data, which has a significantly slower response time compared to memory, but you may gain the benefit of having a lot more space than memory. Generally, caches are meant to be short-term memory storage meaning it strives to have a small and “limited” space which usually keeps track of most recently accessed items (LRU).

When cache is called distributed cache, it means that each object is replicated among multiple independent machines/ cache nodes.

Type of caching services

Caching Prerequisite

Before even deciding on the caching layer, we need to ask ourselves the following question..

Which business use-cases in your system require high throughput, fast response or low latency?
Are we fine with data inconsistency if you use the cache?
What level of consistency does our system require?
What kind of data do you want to store?
- Objects
- Static data
- Simple Key-value pair
Do you need to maintain the cache for transactional / master data?
Do you need in-process cache or shared cache in a single node or distributed cache for n number of nodes?
- In-process cache - your cache elements are local to a single instance of your application. Many medium-to-large applications, however, will not have a single application instance as they will most likely be load-balanced. In such a setting, you will end up with as many caches as your application instances, each having a different state resulting in inconsistency.
- Shared cache - A shared cache is a cache which can be accessed by multiple cores. Since it is shared, each block in the cache is unique and therefore has a larger hit rate as there will be no duplicate blocks. However, data-access latency can increase as multiple cores try to access the same cache.
- Distributed cache of n number of nodes - although deployed on a cluster of multiple nodes, offer a single logical view (and state) of the cache. In most cases, an object stored in a distributed cache cluster will reside on a single node in a distributed cache cluster. By means of a hashing algorithm, the cache engine can always determine on which node a particular key-value resides. Since there is always a single state of the cache cluster, it is never inconsistent.

Caching benefits

Decrease network costs: content can be cached at various points in the network path between the content creator and content origin. When content is cached close to the consumer, request will not cause much additional activity beyond the cache.
Improve responsiveness: - caching enables content to be retrieved faster because an entire network round trip is not necessary. Caches maintained close to the user, like the browser cache, can make this retrieval nearly instantaneous.
Availability of content during network interruptions: With certain policies, caching can be used to serve content to end users even when it may be unavailable for short periods of time from the origin servers.

Caching Strategy

Cache-aside: This is most commonly used caching strategy, where the cache sits on the side and the application directly talks to both the cache and the database.
Read Through / Lazy Loading: Load data into the cache only when necessary. If application needs data for some key x, search in the cache first. If data is present, return the data, otherwise, retrieve the data from data source, put it into the cache & then return.
Write Through: In this write strategy, data is first written to the cache and then to the database. The cache sits in-line with the database and writes always go through the cache to the main database.
Write-back: Here, the application writes data to the cache which acknowledges immediately and after some delay, it writes the data back to the database, both operations occurs as a single-transaction.
Write-Around: Here, data is written directly to the database and only the data that is read makes it way into the cache.

Cache Eviction Policies

The idea here is, given a limit on the number of items to cache, how to choose the thing to evict that gives the best result. The following are a couple of practiced eviction policies. However, from my experience, it is best to come up with a strategy for your application.

Least Frequently Used (LFU): This cache algorithm uses a counter to keep track of how often an entry is accessed. With the LFU cache algorithm, the entry with the lowest count is removed first. This method isn't used that often, as it does not account for an item that had an initially high access rate and then was not accessed for a long time.
Least Recently Used (LRU): This cache algorithm keeps recently used items near the top of cache. Whenever a new item is accessed, the LRU places it at the top of the cache. When the cache limit has been reached, items that have been accessed less recently will be removed starting from the bottom of the cache. This can be an expensive algorithm to use, as it needs to keep "age bits" that show exactly when the item was accessed. In addition, when a LRU cache algorithm deletes an item, the "age bit" changes on all the other items.
Most Recently Used (MRU): This cache algorithm removes the most recently used items first. A MRU algorithm is good in situations in which the older an item is, the more likely it is to be accessed.
Random Replacement (RR): Randomly selects a candidate item and discards it to make space when necessary (Not very useful)

Time to Live or TTL

Time to live (TTL) is the time that an object is stored in a caching system before it’s deleted or refreshed.