Apache Kafka - davidkhala/mq GitHub Wiki
an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation
- written in Scala and Java
No animal, please.
ZK design
- ZooKeeper负责存储Kafka的metadata,包括
- Topic: partition #, replica
- Broker: address, healthy
- Consumer group: registration, offset
- 当有新的Broker加入集群或者某个Broker出现故障时,其他节点可以通过ZooKeeper获取最新的Broker信息,从而进行相应的调整
- Strict Consistency: ZooKeeper通过观察机制(Watch),确保集群中所有节点看到的元数据是一致的。
KRaft
- Metadata Log
- where metadata stored
- like Kafka message log, metadata log is persistent and sequential
- Offset get stored in Kafka directly
- Consumer group 协调仍然由Controller node管理,但change on metadata通过Raft协议同步
Kafka Streams
A fluent, functional Java API (as Library) to handle complex operations like
- grouping a stream by a key
- joining a stream
- Turn compacted topic into a table
Kafka Connect
Sink
Provided by
- Aiven OpenSource: https://github.com/Aiven-Open/gcs-connector-for-apache-kafka
- Confluent: https://docs.confluent.io/kafka-connectors/gcs-sink/current/overview.html
Kafka broker
A Kafka broker is a server in the cluster this will receive and send the data. aka as node
- A Kafka cluster is a group of multiple Kafka brokers.
- Each Kafka broker is identified with an ID (integer).
- All the topic partitions data is Distributed across all brokers(load balanced). Each broker will have certain topic partitions.
- After connecting to any broker (bootstrap broker) you can have connectivity to the entire cluster.
- A good number to get started is 3 brokers. You can create any number of brokers you want no limit to that.
Vendors
Confluent
ksqlDB is well worth checking out for developers looking to build streaming applications while taking advantage of their familiarity with relational databases.
strimzi
Strimzi provides a way to run an Apache Kafka cluster on Kubernetes in various deployment configurations.
bitnami
provectus
IBM Event Streams
- IBM Event Streams is a high-throughput message bus built with Apache Kafka.
- Lite plan (Free) is available in Region Dallas (us-south)
- Offers access to 1 partition in a multi-tenant Event Streams cluster.
Oracle
- OCI Streaming Service
- Transactional Event Queues in Oracle Database
AWS
- Amazon Managed Streaming for Apache Kafka (MSK)