Parallel channel cache writes - adamcfraser/cbnotes GitHub Wiki

Currently channel cache writes are sequential - processEntry obtains the changeCache write lock until it's either (a) Added the sequence to the cache (b) Added the sequence to pending (c) Added the sequence to skipped

If it's (a), after adding the new sequence it will sequentially do addToCache for everything contiguous in pending.

This means that channel cache writes are serialized - when TAP inflow exceeds our ability to write to the channel caches, we get latency.

Proposal

Propose to write sequences to the channels caches in parallel as soon as we get the item from the TAP feed, and do the sequence ordering arithmetic after it's been written to the cache. So we still calculate 'nextSequence', which is the last contiguous sequence that we've seen. We do that after the cache writes, though. The nextSequence calculation would still be serialized, but it would be much less work - just updating the value or pushing the value to the pending list.

Have processEntry spawn a goroutine to do addToCache, regardless of sequence.
When that goroutine returns, do the next/pending/skipped style work to work out when to update c.nextSequence.

On the changes feed side, the key change would be to only return entries earlier than c.nextSequence. Ignore anything else found in the channel caches. So the changes feeds ignore anything earlier than c.nextSequence, in the same way they do now.

For the channel caches, they would need to ensure they don't purge anything later than c.nextSequence, regardless of the cache limits. In fact they probably want to keep at least channelCacheMinLength entries earlier than c.nextSequence.

Benefits

Use multiple processes to write to channel caches

Implications

More inserts into channel caches (vs. appends), as more sequences will be getting sent out of order
Concurrent writes to the same channel cache will still be blocking (based on the channel cache lock)
More complexity in channel cache pruning (to distinguish between sequences > c.nextSequence, and sequences < c.nextSequence)
- Deduplication based on DocID in particular will change - needs to be done at pruning time, not at write time. May have the potential for sending more redundant DocIDs to clients (e.g. revisions they don't care about)
Additional compare in changes processing to ignore cache entries greater than c.nextSequence

It's basically one very large benefit, and several small drawbacks. Needs perf testing to fully evaluate the tradeoff.