Staging schema - cockroachdb/replicator GitHub Wiki

Staging DataBase Schema

Replicator will automatically create a number of staging and metadata tables in a database named _replicator within the staging CockroachDB cluster. This staging database must be manually created, using CREATE DATABASE _replicator. It is recommended that you also ALTER DATABASE _replicator CONFIGURE ZONE USING gc.ttlseconds=300, since Replicator is, essentially, a queue-like workload that produces a relatively large number of MVCC tombstones.

Mutations

Replicator automatically creates staging tables for each target table that act as temporary storage for un-applied mutations. Key and indexes have been omitted here for clarity. See internal/staging/stage package.

-- These tables are created automatically by Replicator and are documented
-- here for operator convenience.
CREATE TABLE _targetDB_targetSchema_targetTable
(
       nanos INT NOT NULL,                 -- Derived from changefeed updated timestamp.
     logical INT NOT NULL,                 -- Derived from changefeed updated timestamp.
         key STRING NOT NULL,              -- A JSON representation of the mutation's primary-key columns.
         mut BYTES NOT NULL,               -- The complete JSON blob of the mutation.
      before BYTES NULL,                   -- Supports conflict resolution
     applied BOOL NOT NULL DEFAULT false,  -- Improves idempotency
  applied_at TIMESTAMPTZ NULL              -- The time at which the mutation was applied.
)

Checkpoints

Incoming checkpoints (e.g. resolved timestamps) are written to a table that effectively forms a queue. See internal/stage/checkpoint for additional details.

CREATE TABLE public.checkpoints
(
  group_name        STRING      NOT NULL, -- Name of a schema within the target.
  source_hlc        DECIMAL     NOT NULL, -- Derived from changefeed updated timestamp.
  partition         STRING      NOT NULL, -- Supports partitioned (e.g. Kafka) delivery
  first_seen        TIMESTAMPTZ NOT NULL DEFAULT now(),
  target_applied_at TIMESTAMPTZ,          -- Set once all mutations with lesser timestamps have been applied.

-- Virtual columns for DBA convenience
  source_nanos      INT8        AS (floor(source_hlc)::INT8) VIRTUAL,
  source_logical    INT8        AS (((source_hlc-floor(source_hlc))*1e10)::INT8) VIRTUAL,
  source_wall_time  TIMESTAMPTZ AS (to_timestamp(floor(source_hlc)::FLOAT8 / 1e9)) VIRTUAL,
)

Auxiliary tables

There are several other auxiliary tables used for cdc-internal coordination:

leases ensures that only a single instance of Replicator will resolve timestamps for any particular target schema.
memo is a catch-all for managing transient state.