How to configure Apache Kafka - manishkushwaha1412/ApacheKafka GitHub Wiki
Welcome to the ApacheKafka wiki!
About Kafka
What is Apache Kafka
- Apache Kafka is distributed streaming platform
- Kafka is used for building real time data pipeline and streaming app. It is horizontally scalable, fault tolerant, wicked fast.
- Fault tolerant meaning if there are multiple nodes and one of them are down then it should not impact in production and taking request will be seamless
There are 3 capabilities
- Publish (write) and subscribe (read) streams of event
- Store streams of records in fault tolerant durable way
- Process streams of records as they are occurred
Kafka API
- The Admin API to manage and inspect topics, brokers, and other Kafka objects.
- The Producer API to publish (write) a stream of events to one or more Kafka topics.
- The Consumer API to subscribe to (read) one or more topics and to process the stream of events produced to them.
- The Kafka Streams API to implement stream processing applications and microservices. It provides higher-level functions to process event streams, including transformations, stateful operations like aggregations and joins, windowing, processing based on event-time, and more. Input is read from one or more topics in order to generate output to one or more topics, effectively transforming the input streams to output streams.
- The Kafka Connect API to build and run reusable data import/export connectors that consume (read) or produce (write) streams of events from and to external systems and applications so they can integrate with Kafka. For example, a connector to a relational database like PostgreSQL might capture every change to a set of tables. However, in practice, you typically don't need to implement your own connectors because the Kafka community already provides hundreds of ready-to-use connectors
Architecture
Download Apache Kafka https://www.apache.org/dyn/closer.cgi?path=/kafka/3.1.0/kafka-3.1.0-src.tgz always download binary not source code
ZooKeeper
- Zookeeper is used to manage Apache kafka cluster environment
- We need to start zookeeper first before starting kafka server
Replication Factor
- Replication factor is shadow instance of streaming data to support fault tolerant
Start Services
Start Zookeeper
*Assume that you have downloaded and extract zip under below location
*D:\Manish\LearnKafka\kafka_2.12-3.1.0 then go to window folder D:\Manish\LearnKafka\kafka_2.12-3.1.0\bin\windows
*And hit command -> zookeeper-server-start.bat {location where zookeeper property file located} note โ zookeeper property file will be under D:\Manish\LearnKafka\kafka-3.1.0-src\config
Once ZooKeeper Started we need to start Server
Start Kafka Server
*Go to -> D:\Manish\LearnKafka\kafka_2.12-3.1.0\bin\windows and hot below command
*Kafka-server-start.bat D:\Manish\LearnKafka\kafka_2.12-3.1.0\config\server.properties
Create Topic in Kafka
*Go to -> D:\Manish\LearnKafka\kafka_2.12-3.1.0\bin\windows and hot below command
*kafka-topics.bat --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 -topic manish where manish is topic name
Now All services/servers has been started and Topic has been created. Now its time to Test Kafka To Test, we can create microservices which will publish message and once message is published successfully, consumer will show the same
Create one spring boot application *publish the message, already code base has been attached to this repository *once message is publish we need to check message as below
** Key Points**
*If only string is being send then we donโt need to do anything and Kafka template will take care. But if we have send Object (JSON) then we need to configure the same