Caching - rFronteddu/general_wiki GitHub Wiki

LB helps scale horizontally across an ever-increasing number of servers, but caching enables to make vastly better use of resources.

Caching consists of:

pre-calculating results (e.g. the number of visits from each referring domain for the previous day),
pre-generating expensive indexes (e.g. suggested stories based on a user’s click history),
and storing copies of frequently accessed data in a faster backend.

Developing a caching strategy early on is important because it ensures you don't optimize access patterns which can't be replicated with your caching mechanism or access patterns where performance becomes unimportant after the addition of caching.

Caches take advantage of the locality of reference principle: Recently requested data is likely to be requested again

A cache is like a short-term memory: it has a limited amount of space, but is typically faster than the original data source and contains the most recently accessed items.
Caches can exist at all levels but are often found at the level nearest to the front end where they can return data quickly without taxing downstream levels.

Types of cache

Application Server Cache

Integrate caching in the application code. Upon receiving a request, check if value is cached otherwise query the DB, write the value in the cache, and reply.
Cache can be on RAM (faster) or DISK (still faster than accessing network storage)
Note that if the request layer is distributed, each node can host its own different cache.
However, if the LB randomly distributes requests across nodes, the same request may go to different nodes, increasing cache misses.
Two choices for overcoming this hurdle are global caches and distributed caches.

DB Cache

Cache between the application and the database. Talented DBA can uncover quite a bit of performance without having to change application code.

In-Memory Cache

Memcached and REDIS are examples of caches that store data in memory. These are fast have limited space because RAM is more expensive than disk (which is still faster than network access). The most straightforward strategy is LRU (least Recently Used).

Content Distribution Network (CDN)

CDNs come into play for sites serving large amounts of static media. They take the burden of serving static media off of your application servers (which are typically optimized for serving dynamic pages rather than static media), and provide geographic distribution.

In a typical CDN setup, a request will first ask the CDN for a piece of static media; the CDN will serve that content if it has it locally. Otherwise it will query your servers for the file and then cache it locally and serve it to the requesting user (in this configuration they are acting as a read-through cache).

When a system is not yet large enough to justify its own CDN, we can ease a future transition by serving static media off a separate subdomain using a lightweight HTTP server like NGINX and cut over the DNS from your service to a CDN later.

Cache Invalidation

Cache must be kept coherent with the source of truth (e.g., DB). This process is known as cache invalidation.

There are three main schemes that are used:

Write-through cache: write data into the cache and corresponding DB at the same time. Strong guarantees (minimizes data loss, and ensure consistency) but higher latency since the data has to be written at least two times before returning success to the client.
Write-around cache: Data is written directly to permanent storage, bypassing the cache. Cache will not be stressed by write requests for data that is not accessed but the first read of data will cause a cache miss to populate the cache.
Write-back cache: Write to cache alone and return success to the client. Write to permanent storage is done after a specified interval or under certain conditions. Lower latency, higher throughput for write-intensive applications. Increased risk of data loss.

Cache Eviction Policies

Cache is a limited resource. It is necessary to have eviction policies in place:

FIFO: Evict first block accessed first, without regard to how often or how many times it was accessed before.
LIFO: Evicts the block accessed most recently without any regard.
Least Recently Used (LRU): Discard the least recently used items first.
Most Recently Used (MRU): Discard the most recently used item first
Least Frequently Used (LFU): Counts how often an item is needed, evict those used least first.
Random Replacement: Randomly select candidates to evict.