Distributed Channel Cache: Initial POC - adamcfraser/cbnotes GitHub Wiki

There are two main goals for the initial POC:

  • Performance profiling for SG nodes using memcached bucket
  • Validate that the existing in-memory cache can be swapped out

High-level design

Note: The following describes an initial implemention, intended to evaluate the above criteria. See next steps for more discussion on the full set of functionality, including multiple cache writers, advanced sequence processing, and improved notification.

The core change is to maintain the change cache in a memcached bucket, instead of in memory. Cache entries will define a key based on channel name and sequence number to allow fast, key-based retrieval of sequences for a given channel. All sync gateway nodes will do their cache reads from this bucket.

Initially, assign a single node in the SG cluster as the cache writer. Cache writer works the TAP feed as today, and writes to the channel cache bucket for each sequence/channel combination. The cache writer does the same sequence buffering that's done today, to ensure that contents of the channel cache are complete . The cache writer periodically writes channel clock information to the channel cache bucket (count of writes or high sequence per channel), for use in notification.

Simplifications made for Initial POC

  • Single cache writer, defined based on server config
    • no election of cache writer
    • no failover/recovery for cache writer
  • Cache writer always starts at zero with empty cache bucket. No handling for stop/start of cache writer, appending to existing bucket
  • No cache expiry/pruning
  • No view-based retrieval/cache backfill by cache readers. (not clear at this point whether this is a 'feature' or 'limitation' of the POC)
  • Cache readers poll the cache for change notification - no push notification

Cache Writer

One SG node will be identified as the cache writer, based on entries in the Sync Gateway config (similar to the way shadowing is defined today). The config will define the memcached bucket connection information for the cache writer.

The cache writer will buffer the tap feed in the same way the current change cache does.

One doc vs multiple doc channel caches

There have been discussions debating whether the representation of the channel cache in the memcached bucket should be a single document per channel (containing the cache contents), or multiple documents per channel.

At this point the benefits of the multiple doc approach outweigh the benefits of the single-doc approach. While the single doc simplifies the notification handling, it has a few potential pitfalls:

  • Cache readers need to parse the single document to evaluate the 'get changes since sequence' operation
  • Potential for contention on channel cache documents for high traffic channels when moving to multiple cache writers
  • Limit on cache size due to document size limit (1 MB for memcached bucket)

Channel Clocks and Notification

The current changes cache does a callback to the _changes feed handler whenever a channel cache is updated, to notify any continuous or longpoll _changes feeds listening to that channel that they should poll the cache for updates.

For the initial POC, the notification processing will be simplified:

  • The cache writer will update a 'channel clock' document whenever it writes an entry to the cache for the channel.
  • Cache readers will run a process to intermittently (every 500ms?) poll the channel clocks, and notify any active _changes feeds.

See next steps for discussion of later enhancements to notification.

Cache Readers

For cache readers, we should be able to replicate the current cache read API with equivalent calls to return data from the memcached bucket. The only exception would be the late sequence feed processing - those calls would be a no-op in the POC.

The main change for readers is the periodic polling of the channel cache clocks, and using the results to drive the existing Wait/Notify functionality.

Open issues

1. Defining the key for channel cache entries

My initial thought was that the key would just be a simple concatenation of channelname-sequence. However, we need to do a more detailed investigation of how best to optimize the keys for query performance.

2. Doc ID de-duplication Currently we're doing de-duplication based on Doc ID whenever we write to the channel caches, so that one volatile doc doesn't fill the entire cache. I assume this would be problematic in a few ways for the distributed cache:

  • For the in-memory cache, we obtain a lock on the cache before doing de-duplication. This isn't an option for the distributed cache
  • The query to identify duplicate DocIDs in the channel cache would add significant performance overhead.

Propose moving the deduplication work to cache readers - allow duplicates in the cache, but do DocID deduplication per iteration of the _changes feed loop. A potential concern is how this might require refactoring of the 'limit' handling, to ensure we're not deduplicating down to a value far below the limit when processing an older _since value.

3. Handling late-arriving sequences (recently fixed in issue #525) A distributed channel cache will need to handle late-arriving sequences in the same way the current cache does. The in-memory approach for late sequence handling may not be a good fit for a distributed cache, however. Currently each channel cache maintains a set of late-arriving sequences, for use by continuous _changes feeds. In a nutshell this means that each channel cache maintains two caches - the regular cache and the late-arriving cache. My feeling is that this functionality doesn't need to be part of the initial POC (as it adds a decent amount of scope), and should be addressed in a subsequent phase, once we've finalized how we're going to handle multiple cache writers.