Connector Configuration - michaelpidde/datastreaming GitHub Wiki

Debezium uses connectors to define user-specific connection details to resources. In this example application, I am connecting to SQL Server using a Kafka Connect connector.

Considerations

My connector lists the tables that I want to transfer events for: "table.include.list": "dbo.customer,dbo.product,dbo.order"

You can define separate connectors per table depending on the requirements of your use-case. For example, I am also setting "snapshot.mode": "schema_only" which prevents Debezium from sending events containing the initial state of the tables (meaning if you have a table with 100 rows and you start the Debezium service, it will send events for the starting state of those 100 rows if it has not already done so).

Different use-cases may want that initial snapshot, for example to use the events to do an initial seeding or true-up of other resources. In my case, I only want to process new events that come in after the Debezium service starts.

Transforms

There are various transforms that you can use to change the structure and verbosity of the events as they flow from Debezium into Kafka. Default Debezium events have a structure like this:

{
  "before": { "id": 1, "name": "Alice" },
  "after": { "id": 1, "name": "Alicia" },
  "op": "u",
  "source": {
    "version": "2.5.1.Final",
    "connector": "sqlserver",
    ...
  },
  "ts_ms": 1723661000000
}

Some transforms can greatly reduce the message overhead. Using this transform:

"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState"

will yield a flattened message that only contains the after state:

{ "id": 1, "name": "Alicia" }

For my application, I need both the before and after states in order to determine what actually changed during the update. See more about transforms here.