Sequence Management - adamcfraser/cbnotes GitHub Wiki

##Sequence Handling in Sync Gateway

NOTE: All of the information below describes the current (June 2015) Sync Gateway approach to sequence management.

The replication protocol used by Sync Gateway and Couchbase Lite is based on documents being associated with unique, monotonically increasing sequence values. When a client initiates a pull replication, it asks Sync Gateway for all changes since a specified sequence. Sync Gateway responds with the list of changed documents since this sequence, and also returns a last sequence value (lastseq).

With this response, Sync Gateway is providing a consistency guarantee to the client: that the client has been sent the complete set of documents (filtered for access control) between since and lastseq. The next time that client replicates, it sends lastseq as the new since value.

The sequence value itself is intended to be opaque to the client, and can be defined as any JSON element.

###Sequence Storage

Each document written by Sync Gateway includes a sequence value, stored in the document's _sync metadata:

{
  "_sync": {
    "rev": "1-480a0a76c43f80e572405c164ffc7e3d",
    "sequence": 181,
    "recent_sequences": [
      181
    ],
    "history": {
      "revs": [
        "1-480a0a76c43f80e572405c164ffc7e3d"
      ],
      "parents": [
        -1
      ],
      "channels": [
        null
      ]
    },
    "time_saved": "2015-06-18T14:34:56.349529424-07:00"
  },
  "value": "1"
}

###Sequence Generation

Sync Gateway uses a counter document in the bucket to generate new sequence values (the _sync:seq). Whenever Sync Gateway writes a new document to the bucket, it first does an atomic increment on the _sync:seq value to obtain a new sequence value.

###Sequence Processing on Write

  1. Validate the document (using the Sync Function).
  2. If valid, obtain a new sequence for the document by incrementing _sync:seq, and insert it as the sequence property in the document's _sync metadata.
  3. Do a CAS update of the document
  • If the CAS fails, 1 and 2 are repeated. When repeated, the previously allocated sequence(s) are stored in the document's _sync metadata (as unusedSequences), for use during sequence buffering (see below).

###Sequence Buffering

Sync Gateway listens to the mutation feed (TAP or DCP) from the Couchbase Server cluster to build an in-memory cache of recent mutations. This cache is used (in part) to handle replication requests. There is no ordering guarantee for sequence values arriving on the mutation feed (there are many opportunities for variable latency between a Sync Gateway node incrementing _sync:seq, Sync Gateway writing the document to a Server node, and that document showing up on the feed).

In order to deliver the client consistency guarantee, Sync Gateway buffers the sequences seen on the feed, and doesn't replicate data to the client until it has a continguous set of sequence values. Sync Gateway tracks the lowest contiguous sequence that's been seen on the feed - the stable sequence value - and only replicates documents up to that sequence value.

###Compound Sequences There are some optimizations in place to ensure that slow-arriving sequences don't block replication indefinitely - if sequences are pending buffering for more than a fixed interval (defaulting to 5s), Sync Gateway can send these to clients with a compound sequence number of the form stable_seq::seq. Clients that subsequently send a since value that's a compound sequence will receive all mutations since stable_seq, and deduplicate any previously seen revisions using the standard revs_diff replication processing.