Kafka Archietecture - rnakidi/dsa GitHub Wiki

Apache Kafka

Apache Kafka is a distributed event streaming platform designed to handle real-time data feeds with high throughput and low latency. Its architecture consists of several key components that work together to provide a robust messaging system.

  1. Producers Producers are applications that publish (write) data to Kafka topics. They send records (messages) to specific topics, and they can choose which partition of a topic to send the records to. Producers can also configure message acknowledgment settings to ensure reliable delivery.

  2. Topics A topic is a category or feed name to which records are published. It is a logical channel for data and serves as the primary abstraction in Kafka. Topics are split into partitions, which allows Kafka to achieve parallelism and scalability.

  3. Partitions Each topic can have multiple partitions, which are ordered, immutable sequences of records that are continually appended. Each partition is replicated across multiple brokers for fault tolerance. The order of records is guaranteed within a partition but not across multiple partitions.

  4. Consumers Consumers are applications that subscribe to topics and read (consume) records from them. Consumers can be part of a consumer group, allowing them to share the workload. Each consumer in a group reads from exclusive partitions of the topic, enabling parallel processing.

  5. Consumer Groups A consumer group is a set of consumers that work together to consume records from one or more topics. Each partition from a topic is assigned to only one consumer in a group, ensuring that records are processed in order while allowing scalability.

  6. Brokers A Kafka cluster consists of one or more brokers (servers). Each broker is responsible for storing partitions of topics and serving client requests. Brokers work together to distribute data and handle client requests efficiently.

  7. Zookeeper Historically, Apache ZooKeeper was used for managing and coordinating Kafka brokers. It handled leader election for partitions and maintained metadata about topics, partitions, and consumer groups. However, as of newer versions (2.8+), Kafka has introduced a mode called "KRaft" (Kafka Raft Metadata mode) that allows Kafka to operate without Zookeeper, providing improved scalability and simplicity.

  8. Replication Kafka ensures data durability and availability through replication. Each partition can have multiple replicas across different brokers. One broker acts as the leader for each partition, while others are followers. Producers and consumers interact with the leader, and the followers replicate data from it.

Souce/Credit: https://www.linkedin.com/posts/sina-riyahi_csharp-efcore-dotnet-activity-7274343832584941568-d3Qd?utm_source=share&utm_medium=member_desktop

What you must know about Kafka?

👉 What is Apache Kafka?

Apache Kafka is a distributed event store and streaming platform.

It began as an internal project at LinkedIn.

Over time, it grew rapidly and today, some of the largest data pipelines in the world use Kafka.

Organizations like Netflix and Uber rely on it for their workflows.

👉 Kafka Messages, Topics, and Partitions

The basic unit of data in Kafka is a Message

Think of a message like a record in a database table. It is transmitted as an array of bytes.

Every message goes to a particular Topic.

You can compare Kafka Topics to a database table or a folder on your computer.

Topics are also made up of multiple partitions.

Partitions improve the redundancy and make the topics horizontally scalable.

👉 Kafka Producers and Kafka Consumer

Producers in Kafka create new messages, batch them, and send them over to a Kafka topic.

A producer also balances messages across the different partitions of a topic.

You can provide a custom partitioning strategy to control the distribution of messages.

Kafka Consumers read messages from a broker.

One or more consumers work as a consumer group to consume messages from a topic.

A consumer instance is tied to a particular partition.

In other words, a partition is owned by a consumer instance.

👉 Kafka Broker and Cluster

A single Kafka server is known as a Broker.

A broker can handle thousands of partitions and millions of messages/second.

Think of the broker as a bridge between the producer and consumer.

It receives messages from producers and handles fetch requests from the consumer.

But the broker as part of a Kafka Cluster

A Kafka Cluster consists of several brokers and provides features like replication.

Every partition is replicated across multiple brokers ensuring high availability and redundancy.

image

Source/Credit: https://www.linkedin.com/posts/saurabh-dashora_what-you-must-know-about-kafka-what-activity-7275398305990262784-cSl-?utm_source=share&utm_medium=member_desktop