How to configure Apache Kafka - manishkushwaha1412/ApacheKafka GitHub Wiki

Welcome to the ApacheKafka wiki!

About Kafka

What is Apache Kafka

Apache Kafka is distributed streaming platform
Kafka is used for building real time data pipeline and streaming app. It is horizontally scalable, fault tolerant, wicked fast.
Fault tolerant meaning if there are multiple nodes and one of them are down then it should not impact in production and taking request will be seamless

There are 3 capabilities

Publish (write) and subscribe (read) streams of event
Store streams of records in fault tolerant durable way
Process streams of records as they are occurred

Kafka API

The Admin API to manage and inspect topics, brokers, and other Kafka objects.
The Producer API to publish (write) a stream of events to one or more Kafka topics.
The Consumer API to subscribe to (read) one or more topics and to process the stream of events produced to them.
The Kafka Streams API to implement stream processing applications and microservices. It provides higher-level functions to process event streams, including transformations, stateful operations like aggregations and joins, windowing, processing based on event-time, and more. Input is read from one or more topics in order to generate output to one or more topics, effectively transforming the input streams to output streams.
The Kafka Connect API to build and run reusable data import/export connectors that consume (read) or produce (write) streams of events from and to external systems and applications so they can integrate with Kafka. For example, a connector to a relational database like PostgreSQL might capture every change to a set of tables. However, in practice, you typically don't need to implement your own connectors because the Kafka community already provides hundreds of ready-to-use connectors

Architecture

Download Apache Kafka https://www.apache.org/dyn/closer.cgi?path=/kafka/3.1.0/kafka-3.1.0-src.tgz always download binary not source code

ZooKeeper

Zookeeper is used to manage Apache kafka cluster environment
We need to start zookeeper first before starting kafka server

Replication Factor

Replication factor is shadow instance of streaming data to support fault tolerant

Start Services

Start Zookeeper *Assume that you have downloaded and extract zip under below location *D:\Manish\LearnKafka\kafka_2.12-3.1.0 then go to window folder D:\Manish\LearnKafka\kafka_2.12-3.1.0\bin\windows *And hit command -> zookeeper-server-start.bat {location where zookeeper property file located} note – zookeeper property file will be under D:\Manish\LearnKafka\kafka-3.1.0-src\config

Once ZooKeeper Started we need to start Server Start Kafka Server *Go to -> D:\Manish\LearnKafka\kafka_2.12-3.1.0\bin\windows and hot below command *Kafka-server-start.bat D:\Manish\LearnKafka\kafka_2.12-3.1.0\config\server.properties

Create Topic in Kafka *Go to -> D:\Manish\LearnKafka\kafka_2.12-3.1.0\bin\windows and hot below command *kafka-topics.bat --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 -topic manish where manish is topic name

Now All services/servers has been started and Topic has been created. Now its time to Test Kafka To Test, we can create microservices which will publish message and once message is published successfully, consumer will show the same

Create one spring boot application *publish the message, already code base has been attached to this repository *once message is publish we need to check message as below

** Key Points** *If only string is being send then we don’t need to do anything and Kafka template will take care. But if we have send Object (JSON) then we need to configure the same