Staging schema - cockroachdb/cdc-sink GitHub Wiki
cdc-sink will automatically create a number of staging and metadata
tables in a database named
_cdc_sink within the staging CockroachDB
cluster. This staging database must be manually created, using
CREATE DATABASE _cdc_sink. It is recommended that you also
ALTER DATABASE _cdc_sink CONFIGURE ZONE USING gc.ttlseconds=300, since
essentially, a queue-like workload that produces a relatively large
number of MVCC tombstones.
cdc-sink automatically creates staging tables for each target table that act as temporary storage for un-applied mutations. Key and indexes have been omitted here for clarity. See
-- These tables are created automatically by cdc-sink and are documented
-- here for operator convenience.
CREATE TABLE _targetDB_targetSchema_targetTable
nanos INT NOT NULL, -- Derived from changefeed updated timestamp.
logical INT NOT NULL, -- Derived from changefeed updated timestamp.
key STRING NOT NULL, -- A JSON representation of the mutation's primary-key columns.
mut JSONB NOT NULL, -- The complete JSON blob of the mutation.
before BYTES NULL, -- Supports conflict resolution
applied BOOL NOT NULL DEFAULT false, -- Improves idempotency
lease TIMESTAMPTZ NULL, -- Support for best-effort modes
Incoming resolved timestamps are written to a table that effectively forms a queue. See
internal/stage/checkpoint for additional details.
CREATE TABLE public.resolved_timestamps
target_schema STRING NOT NULL, -- Name of a schema within the target.
source_nanos INT8 NOT NULL, -- Derived from changefeed updated timestamp.
source_logical INT8 NOT NULL, -- Derived from changefeed updated timestamp.
target_applied_at TIMESTAMP NULL, -- Set once all mutations with lesser timestamps have been applied.
There are several other auxiliary tables used for cdc-internal coordination:
leasesensures that only a single instance of cdc-sink will resolve timestamps for any particular target schema.
memois a catch-all for managing transient state.