Kafka Fundamental - Tuong-Nguyen/Angular-D3-Cometd GitHub Wiki

Example: LinkedIn Data Architecture

This image shows data architecture of LinkedIn Company which uses Kafka as communication mechanism:

LinkedIn Data Architecture

Producer - Consumer - Topic

Producer: Application which sends messages to Kafka system.
Consumer: Application which receives messages from Kafka system.
Topic: Message are categorized in Topic. Producer will send messages to a Topic and Consumer will receive them from the same topic.

Kafka Messaging system

Messages in a Topic

Messages stored in a Topic

Topic will store messages with following characteristics:

  • append-only (ie: cannot insert into the queue)
  • order by time (ie: message are sorted)
  • immutable (ie: Message cannot be updated after sent)

In the picture, the message 3 contains incorrect information. Producer must send message 6 as corrective information. It cannot update message 3 with correct information.

Message information

The message has some information:

  • Timestamp
  • ID: Message ID used by consumer for specified the offset
  • Data: message content in binary data.

Message

Broker - Zookeeper

Broker: Kafka server which consumer and producer connect for sending and receiving messages. Zookeeper: Server for managing brokers for scalability.

Apache Kafka System

Topic Partition

A topic can be split into partitions. Each partition is handled by a broker (A broker can handle multiple partitions). Messages sent to this topic will be distributed in the partitions.

Partition

How messages are distributed into Partions is discussed later.

Topic with multiple Partitions

Each broker manages a partition.

Topic Replication

Topic can be configured to be replicated in multiple Brokers. This increases the availability.

Topic Replication