Distributed Channel Cache: Next Steps - adamcfraser/cbnotes GitHub Wiki

Multiple Cache Writers

We'll want the ability to distribute the cache writer work across multiple nodes in the SG cluster. The 2i projector/router components could be used to distribute the DCP feed to multiple nodes for cache processing. The main challenge here will be around sequence processing - with multiple cache writers, we can't rely on a single node to manage sequence buffering.

More work needs to be done to identify the right approach - options include:

  • Have all cache writers do a blind write to the cache (no sequencing). A separate bot process would work the DCP feed for the cache bucket, and calculates the current low/high sequence number information. This shares some functionality with the parallel channel cache write enhancement that's been added for the in-memory node - the bot would update the safe high sequence clock, and cache readers would use that sequence as the endkey for cache queries.
  • Use the Couchbase Server vbucket sequence information to construct a logical timestamp, and use that in place of our current sequence number. Similar calculation is being used by 2i, although this timestamp approach has some aspects that may not be ideal for replication (in particular the size of the full timestamp - 1024 {vbucket,sequence} tuples).
  • Use projector to shard specified sequence sets to each cache writer, so that each writer knows which sequences it expects to see, and can buffer. (e.g. for nodes(0..N), send each node sequences based on the (sequence mod N)) Each node could then do it's own buffering, and perform shared updates to an overall clock value

Cache writer election, failover

Ideally we'd want any SG node to be able to act as a cache writer, and use a standard leader election approach to handle the scenario where an individual cache writer fails without a serious interruption to the cache processing. This has a few additional requirements, beyond the leader election itself

  • Cache writers coming online need to be able to register themselves with the projector, and define (or have defined for them by the projector) the set of keys they will be listening to
  • Cache writers need to be able to identify the last checkpoint/snapshot written to the cache by the previous writer, and resume from that point
  • Need to handle the scenario where the number of cache writers changes (and by extension the keyset for each of the cache writers may change)

Push-based notification to Cache Readers

Instead of cache readers polling the channel clock data to manage notification, use Channel Cache DCP -> projector -> Cache Readers to monitor updates to the channel clocks. Ideally we'd want cache readers to be able to modify their projector filter at runtime based on the the channels they need (with some latency built in - we don't want to be constantly dropping/adding channels as client _changes requests come in)