How to run Debezium local - cniackz/public GitHub Wiki

Diagram:

telegram-cloud-photo-size-1-4924830774469962713-y

Objective:

To run Debezium local. Debezium is a distributed platform that turns your existing databases into event streams

Pages:

Pre-requisites:

  • Install Java
  • Install Maven
# Apache Maven is under:
cd /Users/cniackz/apache-maven/apache-maven-3.8.6
  • bash_profile:
# Apache Maven
export M2_HOME="/Users/cniackz/apache-maven/apache-maven-3.8.6"
PATH="${M2_HOME}/bin:${PATH}"
export PATH

You should see:

$ mvn -version 
Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)
Maven home: /Users/cniackz/apache-maven/apache-maven-3.8.6
Java version: 18, vendor: Oracle Corporation, runtime: /Library/Java/JavaVirtualMachines/jdk-18.jdk/Contents/Home
Default locale: en_CA, platform encoding: UTF-8
OS name: "mac os x", version: "13.0.1", arch: "aarch64", family: "mac"

Steps:

  1. Install from source, Clone from repo: git clone https://github.com/memiiso/debezium-server-iceberg.git
cd ~
rm -rf ~/debezium-server-iceberg
git clone https://github.com/memiiso/debezium-server-iceberg.git ~/debezium-server-iceberg
  1. From the root of the project: Build and package debezium server:
cd ~/debezium-server-iceberg
mvn -Passembly -Dmaven.test.skip package
  1. After building, unzip your server distribution:
cd /Users/cniackz/debezium-server-iceberg
unzip debezium-server-iceberg-dist/target/debezium-server-iceberg-dist*.zip -d appdist
  1. cd into unzipped folder:
cd /Users/cniackz/debezium-server-iceberg
cd appdist
  1. Create application.properties file and config it:
subl conf/application.properties
# Use iceberg sink
debezium.sink.type=iceberg

# Run without Kafka, use local file to store checkpoints
debezium.source.database.history=io.debezium.relational.history.FileDatabaseHistory
debezium.source.database.history.file.filename=data/status.dat

# Iceberg sink config
debezium.sink.iceberg.table-prefix=debeziumcdc_
debezium.sink.iceberg.upsert=true
debezium.sink.iceberg.upsert-keep-deletes=true
debezium.sink.iceberg.write.format.default=parquet
debezium.sink.iceberg.catalog-name=mycatalog
# Hadoop catalog, you can use other catalog supported by iceberg as well
debezium.sink.iceberg.type=hadoop
debezium.sink.iceberg.warehouse=s3a://my-bucket/iceberg_warehouse
debezium.sink.iceberg.table-namespace=debeziumevents

# S3 config
debezium.sink.iceberg.fs.defaultFS=s3a://my-bucket
debezium.sink.iceberg.com.amazonaws.services.s3.enableV4=true
debezium.sink.iceberg.com.amazonaws.services.s3a.enableV4=true
debezium.sink.iceberg.fs.s3a.aws.credentials.provider=com.amazonaws.auth.DefaultAWSCredentialsProviderChain
debezium.sink.iceberg.fs.s3a.access.key=AWS_ACCESS_KEY
debezium.sink.iceberg.fs.s3a.secret.key=AWS_SECRET_ACCESS_KEY
debezium.sink.iceberg.fs.s3a.path.style.access=true
debezium.sink.iceberg.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem

# enable event schemas - mandate
debezium.format.value.schemas.enable=true
debezium.format.key.schemas.enable=true
debezium.format.value=json
debezium.format.key=json

# postgres source
#debezium.source.connector.class=io.debezium.connector.postgresql.PostgresConnector
#debezium.source.offset.storage.file.filename=data/offsets.dat
#debezium.source.offset.flush.interval.ms=0
#debezium.source.database.hostname=localhost
#debezium.source.database.port=5432
#debezium.source.database.user=postgres
#debezium.source.database.password=postgres
#debezium.source.database.dbname=postgres
#debezium.source.database.server.name=tutorial
#debezium.source.schema.include.list=inventory

# sql server source
#debezium.source.connector.class=io.debezium.connector.sqlserver.SqlServerConnector
#debezium.source.offset.storage.file.filename=data/offsets.dat
#debezium.source.offset.flush.interval.ms=0
#debezium.source.database.hostname=localhost
#debezium.source.database.port=5432
#debezium.source.database.user=debezium
#debezium.source.database.password=debezium
#debezium.source.database.dbname=debezium
#debezium.source.database.server.name=tutorial
#debezium.source.schema.include.list=inventory
# mandate for sql server source, avoid error when snapshot and schema change
#debezium.source.include.schema.changes=false

# do event flattening. unwrap message!
debezium.transforms=unwrap
debezium.transforms.unwrap.type=io.debezium.transforms.ExtractNewRecordState
debezium.transforms.unwrap.add.fields=op,table,source.ts_ms,db
debezium.transforms.unwrap.delete.handling.mode=rewrite
debezium.transforms.unwrap.drop.tombstones=true

# ############ SET LOG LEVELS ############
quarkus.log.level=INFO
quarkus.log.console.json=false
# hadoop, parquet
quarkus.log.category."org.apache.hadoop".level=WARN
quarkus.log.category."org.apache.parquet".level=WARN
# Ignore messages below warning level from Jetty, because it's a bit verbose
quarkus.log.category."org.eclipse.jetty".level=WARN