brain in storm: Smart Caching - CentaurusInfra/regionless-storage-service Wiki

This post records and keeps track of brainstorm ideas smart caching of rkv.

Caching is a well understood technology in modern systems. In essence, caching keeps data that is originally served by remote servers in temporary place and serves consecutive requests without client fetching from the remote if the data is in the temporary storage.

Purpose of smart caching

First of all, we need ask the question: what problem this smart caching is to solve?

  • to optimize latency of clients???
    • what kinds of ops? read, list/watch, or write?
    • what kinds of data?
      • key(specific key, or filter-based selection),
      • revisions (specific, range, or just the latest),
      • value (or reference)

Smart caching of rkv will serve rkv clients with read-list-watch related ops, not write related. In CRUDLW terms, smart caching helps to speed up RLW, excluding CUD.

smart-caching-overall

Smart caching service positions as an external system of rkv. Its purposes to reduce RLW latency of rkv clients.

Multiple smart caching services can be deployed, ideally at places close to clients.

Features of smart caching

Smart caching cooperates with rkv client and rkv server,

  • accepts rkv client request of RLW and returns response as if it was a rkv server
    • If possible, request will be served by smart cache directly w/o involving rkv service;
    • If no where of the caching is able to fulfill the request, request will eventually forward to rkv server to fulfill.

Smart caching API

Data API (RLW set)


same as rkv RLW API.

Cache control API


not mandatory for client to work with smart caching service, but supplementary to provide hints(interests) if client knows for sure; cache service would leverage the hints to further increase cache efficiency.
Cache control API is client session based gRPC bi-directional protocol. Clients sends stream of cache hint messages, and service sends stream of status of cache hint processing. Until tear-down of a client session, cache service will strive to keep up with rkv data sources.

cache hint message type request message body response status
request to cache key path, revision range pending/ready
request to stop key-path ACK (OK)
query of cache key path, revision range pending/ready/none

Requirements of performance etc

  • Performance - caching is all about performance.With cached data in place, smart caching - if deployed at vicinity of clients - should respond read request in less than 1 ms latency on top of the network cost.
  • Consistency - cache guarantees sequential eventual consistency only.
  • Scalability - typical client has one smart cache service only; however, there exists multiple caches in whole system to serve clients of diverse locations. Smart caching should be able to scale to massive number of caches without bringing rkv system to its knee.

Technical challenge

The most critical challenge of rkv smart caching is reuse the data that already being serviced by rkv so as to minimize load pressure to rkv core. To facilitate the data reuse, data should be temporarily stored in the smart caching system (ideally in distributed manner to avoid centralization), and request to access data could be eventually serviced with the cached data, in lieu of being serviced by rkv core. Particularly, as there would be massive data (Petabytes and up) in rkv system, the caching has to have a plan to scale to a distributed caching with multiple cache nodes, efficiently and flexibly.

A Hierarchy of smart cache nodes is able to address such challenge. Each node has arbitrary sized memory as data cache; it receives interest messages (expressing what data is to access, instead how). If the requested data is already in its local cache, it responds with the cached data wrapped in data message; if not found locally, it puts the interest in pending table and forwards the request message to next hop (by looking up its interest forward information base), after the upstream node returns the content message, it responds the data back to the requester, and removes interest from the pending table - this node also can store the data in its local cache at this time. If the incoming interest is already in pending, there is an outstanding process already, the node will wait till the content message returned and responds all the sequesters. This is a CCN-inspired architecture.

cache-node-hierarchy

The nodes form a hierarchical structure, each node identifies the upstream node for a specific interest message by looking up the content name rules. For example, the simplest structure like below have node-1 has rules [*==>node-2], node-2 rule [*==>node-3], node-3 rule [*==>node-4], all use their default upstream node, respectively:

routing-hierarchy-simplest

Content name rules could be more complicated. Below exemplary hierarchy has dedicated node chain for a/b/c/e prefixed path besides the default one; nodes at a/b/c/d chain could have customized cache store and cache policy. node-1 rules are [*==>node-2, a/b/c/d==>node-5, node5 rule [a/b/c/d/e==>node-6].

routing-hierarchy-w-hotspot

Alternative distributed caching

DHT(distributed hash table)/Consistent-Hashing is also able to provide distributed cache. It is well known solution; however, the client has to get the configuration of all participating nodes so as to get hold of the proper node holding the caching, plus the loads(caching in this case) of each node is roughly evenly distributed. In rkv smart caching system, we'd like to have simplest client configuration, and there is no predictable load distribution pattern.

Interest message

Interest expresses what of the data, not how/where of it. It consists of 2 fields

field example
name a/b/c/d
range { "from": rev1, "to": rev2 }

Content Message

field example
name a/b/c/d
range { "from": rev1, "to": rev2 }
content-static base-64 encoded raw data
content-dynamic moniker of content stream (TBD)

Design proposal

Providing cache for static data( of key with known revisions), smart cache system consists of 3 types of components - leaf, internal node, and rkv adapter. For dynamic data (e.g. watch request), there should exist other components that facilitate their reuses (TBD). rkv-smart-cache-client

rkv Smart Cache Leaf

This component accepts rkv client RLW requests and serve them directly in its local cache store unless a hit miss - interest message would be composed and sent to its next hop node; after content message is returned, responds to rkv client and also put the data in local cache store. Local cache store has eviction policy (like LRU).

rkv Smart Cache Adapter

Adapter translate interest message to RLW op, interacting with rkc core to fetch data, abd wrap response in content message. It does not have local cache.

Smart cache node

Internal nodes forms a hierarchy that one is able to forward unfulfilled interest to next hop, by longest prefix matching of rules. Each node has arbitrarily sized cache, which could employes its own eviction policy (LRU, or random eviction).

Walk through: clients of different regions using smart caching to read key of certain revision

Given below structure (this may not be the ideal structure; for demo purpose only), smart-cache-clients-sample

client-1, client-2 and client-3 are in different regions, and each has its cache leaf in vicinity.

  1. client-1 at region-1 request to read a/x of revision r, leaf node decides cache miss and forward interest message to node-1. and node-1 cache miss and forward node-2, so on till rkv-adapter-1. rkv-adapter-1 issues read a/x with rev r, and gets back the value, wrapped in content message, which traverses along the reverse path back to leaf-1; some nodes(say node-2) on the path may decide to store the content in its cache store.
  2. client-2 at region-2 request to read a/x of rev r, too. Its leaf (leaf-2) decides cache miss; interest message would be forward to node-1, then node-2 - where it hit the cache and return the content back.
  3. client-1 requests to read b/y of rev k; the interest message would go via node-1, node-5 and node-6 till adapter-2; after adatoer-2 gets back the value, the wrapped content message would go back eventually to leaf-1; some nodes (say node-1, node-6) at the path store the content in their content store.
  4. client-3 at region 3 also request to read b/y of rev k. Its leaf node (leaf-3) decides cache miss, and interest message will go to node-4, where is a miss and in turn forward to node-5, where is also a miss and forwards to node-6, where is a hit - the content message is sent back via node-5 and node-4, where they may decide to cache it (say node-4 caches).
  5. client-2 later request for y/b of rev k. Its leaf knows it is a miss, and interest message go to node-4, where the content be found in store, and immediately returned back to leaf-2.

Papers that might be interesting

  1. CCN- Content Centric Network
  2. Interest-based cooperative caching in multi-hop wireless network
  3. Caching on the Move: A User Interest-Driven Caching Strategy for D2D Content Sharing
  4. wikipedia page: information-centered networking caching policies
⚠️ **GitHub.com Fallback** ⚠️