Staging schema - cockroachdb/replicator GitHub Wiki
Staging DataBase Schema
Replicator will automatically create a number of staging and metadata tables in a database named
_replicator
within the staging CockroachDB cluster. This staging database must be manually created,
using CREATE DATABASE _replicator
. It is recommended that you also
ALTER DATABASE _replicator CONFIGURE ZONE USING gc.ttlseconds=300
, since Replicator is, essentially,
a queue-like workload that produces a relatively large number of MVCC tombstones.
Mutations
Replicator automatically creates staging tables for each target table that act as temporary storage
for un-applied mutations. Key and indexes have been omitted here for clarity. See
internal/staging/stage
package.
-- These tables are created automatically by Replicator and are documented
-- here for operator convenience.
CREATE TABLE _targetDB_targetSchema_targetTable
(
nanos INT NOT NULL, -- Derived from changefeed updated timestamp.
logical INT NOT NULL, -- Derived from changefeed updated timestamp.
key STRING NOT NULL, -- A JSON representation of the mutation's primary-key columns.
mut BYTES NOT NULL, -- The complete JSON blob of the mutation.
before BYTES NULL, -- Supports conflict resolution
applied BOOL NOT NULL DEFAULT false, -- Improves idempotency
applied_at TIMESTAMPTZ NULL -- The time at which the mutation was applied.
)
Checkpoints
Incoming checkpoints (e.g. resolved timestamps) are written to a table that effectively forms a queue. See
internal/stage/checkpoint
for additional details.
CREATE TABLE public.checkpoints
(
group_name STRING NOT NULL, -- Name of a schema within the target.
source_hlc DECIMAL NOT NULL, -- Derived from changefeed updated timestamp.
partition STRING NOT NULL, -- Supports partitioned (e.g. Kafka) delivery
first_seen TIMESTAMPTZ NOT NULL DEFAULT now(),
target_applied_at TIMESTAMPTZ, -- Set once all mutations with lesser timestamps have been applied.
-- Virtual columns for DBA convenience
source_nanos INT8 AS (floor(source_hlc)::INT8) VIRTUAL,
source_logical INT8 AS (((source_hlc-floor(source_hlc))*1e10)::INT8) VIRTUAL,
source_wall_time TIMESTAMPTZ AS (to_timestamp(floor(source_hlc)::FLOAT8 / 1e9)) VIRTUAL,
)
Auxiliary tables
There are several other auxiliary tables used for cdc-internal coordination:
leases
ensures that only a single instance of Replicator will resolve timestamps for any particular target schema.memo
is a catch-all for managing transient state.