Publish from Kafka, Persist on MinIO - cniackz/public GitHub Wiki

Diagram:

telegram-cloud-photo-size-1-4922520786439350953-y

Last tested:

  • Thu Dec 1st 2022: PASSED!

Objective:

To save Kafka's Events on MinIO Bucket.

Inspired from:

Pre-Steps:

  1. Install Confluent Hub Client as stated in: Installing Confluent Hub Client
# Window 1:
confluent-hub install confluentinc/kafka-connect-s3:latest \
  --component-dir /Users/cniackz/confluent-plugins \
  --worker-configs /Users/cniackz/kafka/kafka_2.13-3.3.1/config/connect-distributed.properties

You should see:

$ confluent-hub install confluentinc/kafka-connect-s3:latest \
>    --component-dir /Users/cniackz/confluent-plugins \
>    --worker-configs /Users/cniackz/kafka/kafka_2.13-3.3.1/config/connect-distributed.properties
 
Component's license: 
Confluent Community License 
http://www.confluent.io/confluent-community-license 
I agree to the software license agreement (yN) y

Downloading component Kafka Connect S3 10.3.0, provided by Confluent, Inc. from Confluent Hub and installing into /Users/cniackz/confluent-plugins 
Do you want to uninstall existing version 10.3.0? (yN) y

Adding installation directory to plugin path in the following files: 
  /Users/cniackz/kafka/kafka_2.13-3.3.1/config/connect-distributed.properties 
 
Completed 
  1. Clean up MinIO Drives to start fresh:
# Window 1:
cd /Volumes/data1
rm -rf *
rm -rf .minio.sys/

cd /Volumes/data2
rm -rf *
rm -rf .minio.sys/

cd /Volumes/data3
rm -rf *
rm -rf .minio.sys/

cd /Volumes/data4
rm -rf *
rm -rf .minio.sys/

Steps:

  1. Get Kafka Running:
# Window 1:
# Download files:
rm -rf ~/kafka
mkdir ~/kafka
cd ~/kafka; wget https://dlcdn.apache.org/kafka/3.3.1/kafka_2.13-3.3.1.tgz
tar -xzf kafka_2.13-3.3.1.tgz
cd kafka_2.13-3.3.1;
pwd;


# Window 1:
# Kafka with ZooKeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties
# Window 2:
# Start the Kafka broker service:
bin/kafka-server-start.sh config/server.properties
  1. Get MinIO UP and Running!:
# Window 3:
# Execute the MinIO Server:
MINIO_ROOT_USER=minio MINIO_ROOT_PASSWORD=minio123 minio server /Volumes/data{1...4} --address :9000 --console-address :9001
# Window 4:
# Get mc client ready
mc alias set myminio http://192.168.1.151:9000 minio minio123
mc mb myminio/kafka-bucket
  1. Connect them with S3 Connector:
  • File: /Users/cniackz/kafka/kafka_2.13-3.3.1/config/connect-distributed.properties
# Window 4:
subl /Users/cniackz/kafka/kafka_2.13-3.3.1/config/connect-distributed.properties

You should have:

bootstrap.servers=localhost:9092
group.id=connect-cluster
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1
config.storage.topic=connect-configs
config.storage.replication.factor=1
status.storage.topic=connect-status
status.storage.replication.factor=1
offset.flush.interval.ms=10000
plugin.path = /Users/cniackz/confluent-plugins
offset.storage.file.filename = /tmp/connect.offsets
  • File: /Users/cniackz/kafka/kafka_2.13-3.3.1/config/s3-sink.properties
# Window 4:
subl /Users/cniackz/kafka/kafka_2.13-3.3.1/config/s3-sink.properties
name=s3-sink
connector.class=io.confluent.connect.s3.S3SinkConnector
task.max=1
topics=minio-topic-5
s3.region=us-east-1
s3.bucket.name=kafka-bucket
s3.part.size=5242880
flush.size=3
store.url=http://127.0.0.1:9000
storage.class=io.confluent.connect.s3.storage.S3Storage
format.class=io.confluent.connect.s3.format.json.JsonFormat
schema.generator.class=io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator
partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner
schema.compatibility=NONE
behavior.on.null.values=ignore
# Window 5:
# Provide credentials and connect Kafka with MinIO:
export AWS_ACCESS_KEY_ID=minio
export AWS_SECRET_ACCESS_KEY=minio123
unset AWS_SESSION_TOKEN
cd /Users/cniackz/kafka/kafka_2.13-3.3.1
./bin/connect-standalone.sh   \
  /Users/cniackz/kafka/kafka_2.13-3.3.1/config/connect-distributed.properties   \
  /Users/cniackz/kafka/kafka_2.13-3.3.1/config/s3-sink.properties
  1. Publish events in the Kafka's Topic:
# Window 6:
cd /Users/cniackz/kafka/kafka_2.13-3.3.1
bin/kafka-console-producer.sh --topic minio-topic-5 --bootstrap-server localhost:9092

You should see:

$ bin/kafka-console-producer.sh --topic minio-topic-5 --bootstrap-server localhost:9092
>1
>2
>3
>
  1. Look at saved Topic(s) and Event(s) in MinIO's Bucket:
# Window 4:
mc ls myminio/kafka-bucket/topics

You should see:

$ mc ls myminio/kafka-bucket/topics
[2022-12-01 16:23:08 CST]     0B minio-topic-5/