Cache Writer optimization - adamcfraser/cbnotes GitHub Wiki

We want to optimize the cache writer to support parallel cache writes for tap feed entries. This could be either parallel goroutines writing entries on a single cache writer node, or multiple cache writer nodes.

The current cache design assumes serialized, ordered sequence processing (see Non-sparse Cache Design, below) - when a cache reader sees an update on a cache clock, they can assume that everything written to the cache is complete (no gaps).

To support concurrent writes, we need to define a stable sequence value for cache readers. The stable sequence value is the guarantee to writers that all sequences prior to the stable sequence have been written to the cache.

Implementation Notes

Cache Writer

  • Reads the tap feed, and buffers a set of n sequences
  • Starts a process to cache these sequences in parallel
    • Groups the sequences by channel
    • Starts a goroutine for each channel update, passing the set of sequences. Will update cache blocks and cache clock
    • Starts a goroutine to write each sequence entry document
  • When process completes, updates a _cache:stableSequence document with the high sequence value written by the process, and repeats

While the cache writing process is running, the cache writer is filling the buffer for the next iteration. In an initial implementation we may not need any handling for minimum or maximum buffer size - each iteration can just take whatever is waiting in the queue.

Cache Reader

  • can switch to using _cache:stableSequence as the initial notify trigger (via polling for now)
  • on notify, use the channel clocks to identify which _changes feeds to wake up
  • when processing channel blocks, ignore any sequences greater than stable_sequence

Potential issues:

  1. Reader processing:
  2. Reads an updated stableSequence value
  3. Check whether channel clocks have been updated
  4. Wake up _changes requests listening for those channels

If the cache writer updates the clock between 1. and 2., it can result in an empty iteration of the _changes loop. For _longpoll requests, that could result in an empty response. No impact on correctness - just an additional round trip.


Non-sparse Cache Design

####Data Model Cache Block

Key format: _cache_block:[channelName]:[blockIndex]

Raw storage, with the format

  • Sequence offset- 8 bytes. Starting sequence for the block.
  • Sequence array - 1 byte per sequence, indexed as (sequence - sequenceOffset). Currently only using two bits of each byte for information:
    • bit 0 - flag for whether the sequence is in this channel
    • bit 1 - flag for whether the sequence indicates a removal from the channel

Cache clock

Key = _cache_clock:[channelname]

Integer value - incremented whenever a sequence is added to the channel

Cache entry

Key: _cache_entry:[sequence] JSON document containing LogEntry information for a sequence returned by the _changes feed: DocID, RevID, Sequence, Flags

####Cache Write Operations For each tap feed entry:

  • 1 write for the _cache:entry document
    • stores the information sent by the _changes feed (DocID, RevID, Sequence, Flags)
  • per channel:
    • 1 read for the cache block
    • 1 CAS write for the cache block
    • 1 incr for the cache clock

Total ops per feed entry: 1 + (# channels * 3)

####Cache Read Operations For each _changes feed iteration:

  • per channel:
    • 1 read of the cache clock
    • if clock shows changes
      • 1 read of cache block (low potential for additional cache block read):
        • per sequence:
          • read of _cache:entry document

Total ops per iteration: (# channels) * (2 + (#changes))

Benefits:

  • automatic sequence ordering in cache blocks
  • identify target block for sequence by key - no additional call to look up which block a sequence is in

Cons:

  • Read/CAS write needed for update
    • for a single streamed writer, it's slow
    • for multi-streamed or multiple writers, high contention on busy channels
  • unused storage for sparse channels, could be as low as 1 file per sequence

Sparse Cache Design

####Data Model Cache Block

Key format: _cache_block:[channelName]:[blockIndex]

Raw storage, with the format

  • Block low sequence - 8 bytes. Low sequence value for the block.
  • Sequence list - 8 bytes per sequence, for sequence ID. Uses signing bit to indicate removal. Sequences are not guaranteed to be ordered within the block

Cache clock

Key = _cache_clock:[channelname]

Integer value - incremented whenever a sequence is added to the channel

Cache entry

Key: _cache_entry:[sequence] JSON document containing LogEntry information for a sequence returned by the _changes feed: DocID, RevID, Sequence, Flags

####Cache Write Operations For each tap feed entry:

  • 1 write for the _cache:entry document
    • stores the information sent by the _changes feed (DocID, RevID, Sequence, Flags)
  • per channel:
    • 1 append to the current cache block
    • 1 incr for the cache clock

Total ops per feed entry: 1 + (# channels * 2)

####Cache Read Operations For each _changes feed iteration:

  • per channel:
    • 1 read of the cache clock
    • if clock shows changes
      • 1 read of cache block (low potential for additional cache block read):
        • read entire block and order sequences in memory
        • per sequence:
          • read of _cache:entry document

Total ops per iteration: (# channels) * (2 + (#changes))

Benefits:

  • no contention on write

Cons:

  • more storage needed per sequence in the block (fewer sequences per block)
  • need to identify which block to write a particular sequence to. Could continue with fixed sequence ranges for blocks, but has the same potential for underused storage