0.11 Kafka and Kafka stream - amresh087/Question GitHub Wiki

Confluent Kafka/ Amazon MSK (Managed Streaming for Apache Kafka)

In kafka, everything start with topic. You can say topic is equal to Table in database. Topic is divided in two part one is key another is value part. Key is equal to primary key in table.

But if you working ok billion of data then in database all the data and table sitting like one sute case. If low data then database is fast. But when billion data preasent then it will very slow. Becuase in database there is no concept called load balancing.

Kafka is reduce limitation of database becuase kafka is concept called partition. Topic is divided into partition

Now question comming, in which partition which data will go. So it is based on key but in database all keys will go in same table. Even if you have one billion record then all data will go in same table. Getting then one billion record from the database taking time.

But what happend in kafka, we can create partitions mean one topic have multiple partitions.

Another concept in kafka called partitioner assigner. It is responsible to select the partition for each key. It will decide which key will go in which partition. The partitioner assigner internally used one algorithem called murmur2 The murmur2 algorithem intenal working like below formula

The murmur2 algorithem intenal working like below formula

 hash(Key)% number of partitions

It is like hashing machisim in HashMap.

In kafka, each partition we can consider indivisul entity. Mean every partition we can put seprate system. Now we can say each partition can give one server.

How many server we can assign mean server equal to partition. if you are going to add more server compare to partition. Then server will be ideal not in use.

We can also assign less server compare to partitions

For controll them, there are another concept called consumer group The consumer group is nothing but the group of servers working together.

We can create consumer group based on micro service.

For example some service can take more time like batch job or some schedule job but some micro service need process in less time.
For example there are two microservice called order and transport.
For example order service deliver message with in 5 seconds and transport service can execute daily once.

Consumer group is dependent on each micro service

Same data multiple people can consume and people called as consumer group.

In Kafka, another concept called Topology. Topology is nothing but group of statement. Input, processing, output combined together called Topology .

Topology is nothing but entrie flow

Another concept in kafka is state Store. It is storing data for statefull processing. Now we are trying to understand state store for suppose we need to calculate salary for total employee. While we are calculate we need to store in memory for proccessing is called state store. State store is reside out of Topology. Here we are calculating total salary so we are storing in state store then fetching and perform addition then again store in state store. Finally it will calculate all salary

In kafka, there another concept called DSL and Stream API How to proccesor doing proccessing either it can used DSL or Stream API. If we going to use low level api then it is called DSL. Kafka is started younger state it is give you can write everthing like you need to create topic, partitoner assigner all things you are doing then it is called DSL.

Stream API is just like forEach loop. Stream API is high level

In Kafka another concept called windowing Like in table if you want to get data from 9am to 6pm then we need to use where condition. In kafka, if we need to perform time base operation then we need to used windowing Mean for example every 15 min you will get one chunk. Becuace topic contain hole data. So we can say it is kafka batching processing by using windowing

If you want to use our own partitioner assigner then we need to use DSL. If you do not want use own partitioner the default partitioner assigner then we can use stream Api.

Kafka Evaluation

Kafka Component

Kafka partition offset

Kafka consumer group

Kafka Connector

Kafka Stream

Kafka Stream Architecture

KSQL

https://github.com/LearningJournal/Apache-Kafka-For-Absolute-Beginners

Server and Cluster Management

Kafka Command	Description	Usage Example
kafka-server-start.sh	Starts a Kafka broker using a given configuration file.	./kafka-server-start.sh config/server.properties
kafka-server-stop.sh	Stops a running Kafka broker.	./kafka-server-stop.sh
zookeeper-server-start.sh	Starts a ZooKeeper server (for Kafka versions that use ZooKeeper).	./zookeeper-server-start.sh config/zookeeper.properties
zookeeper-server-stop.sh	Stops a running ZooKeeper server.	./zookeeper-server-stop.sh
kafka-storage.sh	Generates a random cluster UUID and formats storage when running Kafka in KRaft mode.	KAFKA_CLUSTER_ID=$(./kafka-storage.sh random-uuid)./kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties
kafka-topics.sh	Manages topics – create, list, describe, alter (e.g., add partitions) or delete topics.	Create: ./kafka-topics.sh --bootstrap-server localhost:9092 --create --topic test-topic --partitions 3 --replication-factor 1

Topic Management

Kafka Command	Description	Usage Example
kafka-console-producer.sh	Reads messages from standard input and publishes them to a specified topic.	./kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic

Producer and Consumer Tools

Kafka Command	Description	Usage Example
kafka-console-consumer.sh	Consumes messages from a topic and prints them to the console; supports options like starting from the beginning.	./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --from-beginning
kafka-consumer-groups.sh	Manages consumer groups – list groups, describe group details (offsets, lag, partition assignments) and reset offsets.	./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group my-group

Consumer Group and Offset Management

Kafka Command	Description	Usage Example
kafka-configs.sh	Alters or describes configurations for topics, brokers, or client entities.	./kafka-configs.sh --bootstrap-server localhost:9092 --describe --entity-type topics --entity-name test-topic
kafka-features.sh	Manages Kafka feature flags; for example, you can describe, upgrade, downgrade, or disable features.	./kafka-features.sh --bootstrap-server localhost:9092 describe

Additional Tools and Utilities

Kafka Command	Description	Usage Example
kafka-delete-records.sh	Deletes records from topic partitions up to a specified offset using a JSON file.	./kafka-delete-records.sh --bootstrap-server localhost:9092 --offset-json-file deleterecords.json
kafka-dump-log.sh	Dumps and inspects Kafka log segments for troubleshooting or analysis.	./kafka-dump-log.sh --files /tmp/kafka-logs/test-topic/00000000000000000000.log
kafka-leader-election.sh	Manually triggers a leader election for specific topic partitions.	./kafka-leader-election.sh --bootstrap-server localhost:9092 --topic test-topic --partition 0 --election-type preferred
kafka-transactions.sh	Lists, describes, or aborts ongoing transactions to help manage transactional producers.	./kafka-transactions.sh --bootstrap-server localhost:9092 list
kafka-reassign-partitions.sh	Generates or executes partition reassignment plans to balance load or migrate data.	./kafka-reassign-partitions.sh --bootstrap-server localhost:9092 --generate --topics-to-move-json-file topics.json
kafka-log-dirs.sh	Queries the log directories on brokers to see replica placement and disk usage statistics.	./kafka-log-dirs.sh --bootstrap-server localhost:9092 --describe
kafka-replica-verification.sh	Verifies that all replicas for a topic have the same data for consistency checks.	./kafka-replica-verification.sh --broker-list localhost:9092 --topics-include "test-topic"
connect-mirror-maker.sh	Uses Kafka Connect to replicate topics between clusters (the newer MirrorMaker 2.0 approach).	./connect-mirror-maker.sh --clusters cluster1 cluster2 mm2.properties
kafka-streams-application-reset.sh	Resets offsets for a Kafka Streams application, enabling reprocessing of data.	./kafka-streams-application-reset.sh --application-id my-streams-app --input-topics my-input-topic
kafka-acls.sh	Manages access control lists (ACLs) to grant or remove permissions on Kafka resources.	./kafka-acls.sh --bootstrap-server localhost:9092 --add --allow-principal User:alice --operation Read --topic test-topic
kafka-delegation-tokens.sh	Creates, renews, expires, or describes delegation tokens used for authentication between Kafka clients and brokers.	./kafka-delegation-tokens.sh --bootstrap-server localhost:9092 --command-config config.properties --create
zookeeper-security-migration.sh	Updates ZooKeeper ACLs during migration to a secure setup (used with legacy setups using ZooKeeper).	./zookeeper-security-migration.sh --zookeeper.connect localhost:2181 --zookeeper.acl secure
kafka-producer-perf-test.sh	Benchmarks Kafka producer performance by sending large volumes of messages with configurable throughput and batch sizes.	./kafka-producer-perf-test.sh --topic test-topic --num-records 100000 --record-size 1000 --throughput 100000
kafka-consumer-perf-test.sh	Benchmarks Kafka consumer performance by measuring consumption speed and processing time for a given number of messages.	./kafka-consumer-perf-test.sh --topic test-topic --messages 100000 --bootstrap-server localhost:9092
kafka-e2e-latency.sh	Measures end-to-end latency from message production to consumption to help assess performance across the cluster.	./kafka-e2e-latency.sh localhost:9092 test-topic 10000 1 20
kafka-jmx.sh	Retrieves JMX metrics from Kafka brokers for monitoring and troubleshooting purposes.	./kafka-jmx.sh --jmx-url service:jmx:rmi:///jndi/rmi://localhost:9999/jmxrmi --object-name "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec"
kafka-run-class.sh	A utility used to execute Kafka Java classes directly; often used internally by other scripts (e.g., GetOffsetShell).	./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic test-topic --time -1