0.11 Kafka and Kafka stream - amresh087/Question GitHub Wiki
Confluent Kafka/ Amazon MSK (Managed Streaming for Apache Kafka)
In kafka, everything start with topic. You can say topic is equal to Table in database. Topic is divided in two part one is key another is value part. Key is equal to primary key in table.
But if you working ok billion of data then in database all the data and table sitting like one sute case. If low data then database is fast. But when billion data preasent then it will very slow. Becuase in database there is no concept called load balancing.
Kafka is reduce limitation of database becuase kafka is concept called partition. Topic is divided into partition
Now question comming, in which partition which data will go. So it is based on key but in database all keys will go in same table. Even if you have one billion record then all data will go in same table. Getting then one billion record from the database taking time.
But what happend in kafka, we can create partitions mean one topic have multiple partitions.
Another concept in kafka called partitioner assigner. It is responsible to select the partition for each key. It will decide which key will go in which partition. The partitioner assigner internally used one algorithem called murmur2 The murmur2 algorithem intenal working like below formula
The murmur2 algorithem intenal working like below formula
hash(Key)% number of partitions
It is like hashing machisim in HashMap.
In kafka, each partition we can consider indivisul entity. Mean every partition we can put seprate system. Now we can say each partition can give one server.
How many server we can assign mean server equal to partition. if you are going to add more server compare to partition. Then server will be ideal not in use.
We can also assign less server compare to partitions
For controll them, there are another concept called consumer group The consumer group is nothing but the group of servers working together.
We can create consumer group based on micro service.
-
For example some service can take more time like batch job or some schedule job but some micro service need process in less time.
-
For example there are two microservice called order and transport.
-
For example order service deliver message with in 5 seconds and transport service can execute daily once.
Consumer group is dependent on each micro service
Same data multiple people can consume and people called as consumer group.
In Kafka, another concept called Topology. Topology is nothing but group of statement. Input, processing, output combined together called Topology .
Topology is nothing but entrie flow
Another concept in kafka is state Store. It is storing data for statefull processing. Now we are trying to understand state store for suppose we need to calculate salary for total employee.
While we are calculate we need to store in memory for proccessing is called state store. State store is reside out of Topology. Here we are calculating total salary so we are storing in state store then fetching and perform addition then again store in state store. Finally it will calculate all salary
In kafka, there another concept called DSL and Stream API How to proccesor doing proccessing either it can used DSL or Stream API. If we going to use low level api then it is called DSL. Kafka is started younger state it is give you can write everthing like you need to create topic, partitoner assigner all things you are doing then it is called DSL.
Stream API is just like forEach loop. Stream API is high level
In Kafka another concept called windowing Like in table if you want to get data from 9am to 6pm then we need to use where condition. In kafka, if we need to perform time base operation then we need to used windowing Mean for example every 15 min you will get one chunk. Becuace topic contain hole data. So we can say it is kafka batching processing by using windowing
If you want to use our own partitioner assigner then we need to use DSL. If you do not want use own partitioner the default partitioner assigner then we can use stream Api.
Kafka Evaluation
Kafka Component
Kafka partition offset
Kafka consumer group
Kafka Connector
Kafka Stream
Kafka Stream Architecture
KSQL
https://github.com/LearningJournal/Apache-Kafka-For-Absolute-Beginners
Server and Cluster Management
Kafka Command | Description | Usage Example |
---|---|---|
kafka-server-start.sh | Starts a Kafka broker using a given configuration file. | ./kafka-server-start.sh config/server.properties |
kafka-server-stop.sh | Stops a running Kafka broker. | ./kafka-server-stop.sh |
zookeeper-server-start.sh | Starts a ZooKeeper server (for Kafka versions that use ZooKeeper). | ./zookeeper-server-start.sh config/zookeeper.properties |
zookeeper-server-stop.sh | Stops a running ZooKeeper server. | ./zookeeper-server-stop.sh |
kafka-storage.sh | Generates a random cluster UUID and formats storage when running Kafka in KRaft mode. | KAFKA_CLUSTER_ID=$(./kafka-storage.sh random-uuid)./kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties |
kafka-topics.sh | Manages topics – create, list, describe, alter (e.g., add partitions) or delete topics. | Create: ./kafka-topics.sh --bootstrap-server localhost:9092 --create --topic test-topic --partitions 3 --replication-factor 1 |
Topic Management
Kafka Command | Description | Usage Example |
---|---|---|
kafka-console-producer.sh | Reads messages from standard input and publishes them to a specified topic. | ./kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic |
Producer and Consumer Tools
Kafka Command | Description | Usage Example |
---|---|---|
kafka-console-consumer.sh | Consumes messages from a topic and prints them to the console; supports options like starting from the beginning. | ./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --from-beginning |
kafka-consumer-groups.sh | Manages consumer groups – list groups, describe group details (offsets, lag, partition assignments) and reset offsets. | ./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group my-group |
Consumer Group and Offset Management
Kafka Command | Description | Usage Example |
---|---|---|
kafka-configs.sh | Alters or describes configurations for topics, brokers, or client entities. | ./kafka-configs.sh --bootstrap-server localhost:9092 --describe --entity-type topics --entity-name test-topic |
kafka-features.sh | Manages Kafka feature flags; for example, you can describe, upgrade, downgrade, or disable features. | ./kafka-features.sh --bootstrap-server localhost:9092 describe |
Additional Tools and Utilities
Kafka Command | Description | Usage Example |
---|---|---|
kafka-delete-records.sh | Deletes records from topic partitions up to a specified offset using a JSON file. | ./kafka-delete-records.sh --bootstrap-server localhost:9092 --offset-json-file deleterecords.json |
kafka-dump-log.sh | Dumps and inspects Kafka log segments for troubleshooting or analysis. | ./kafka-dump-log.sh --files /tmp/kafka-logs/test-topic/00000000000000000000.log |
kafka-leader-election.sh | Manually triggers a leader election for specific topic partitions. | ./kafka-leader-election.sh --bootstrap-server localhost:9092 --topic test-topic --partition 0 --election-type preferred |
kafka-transactions.sh | Lists, describes, or aborts ongoing transactions to help manage transactional producers. | ./kafka-transactions.sh --bootstrap-server localhost:9092 list |
kafka-reassign-partitions.sh | Generates or executes partition reassignment plans to balance load or migrate data. | ./kafka-reassign-partitions.sh --bootstrap-server localhost:9092 --generate --topics-to-move-json-file topics.json |
kafka-log-dirs.sh | Queries the log directories on brokers to see replica placement and disk usage statistics. | ./kafka-log-dirs.sh --bootstrap-server localhost:9092 --describe |
kafka-replica-verification.sh | Verifies that all replicas for a topic have the same data for consistency checks. | ./kafka-replica-verification.sh --broker-list localhost:9092 --topics-include "test-topic" |
connect-mirror-maker.sh | Uses Kafka Connect to replicate topics between clusters (the newer MirrorMaker 2.0 approach). | ./connect-mirror-maker.sh --clusters cluster1 cluster2 mm2.properties |
kafka-streams-application-reset.sh | Resets offsets for a Kafka Streams application, enabling reprocessing of data. | ./kafka-streams-application-reset.sh --application-id my-streams-app --input-topics my-input-topic |
kafka-acls.sh | Manages access control lists (ACLs) to grant or remove permissions on Kafka resources. | ./kafka-acls.sh --bootstrap-server localhost:9092 --add --allow-principal User:alice --operation Read --topic test-topic |
kafka-delegation-tokens.sh | Creates, renews, expires, or describes delegation tokens used for authentication between Kafka clients and brokers. | ./kafka-delegation-tokens.sh --bootstrap-server localhost:9092 --command-config config.properties --create |
zookeeper-security-migration.sh | Updates ZooKeeper ACLs during migration to a secure setup (used with legacy setups using ZooKeeper). | ./zookeeper-security-migration.sh --zookeeper.connect localhost:2181 --zookeeper.acl secure |
kafka-producer-perf-test.sh | Benchmarks Kafka producer performance by sending large volumes of messages with configurable throughput and batch sizes. | ./kafka-producer-perf-test.sh --topic test-topic --num-records 100000 --record-size 1000 --throughput 100000 |
kafka-consumer-perf-test.sh | Benchmarks Kafka consumer performance by measuring consumption speed and processing time for a given number of messages. | ./kafka-consumer-perf-test.sh --topic test-topic --messages 100000 --bootstrap-server localhost:9092 |
kafka-e2e-latency.sh | Measures end-to-end latency from message production to consumption to help assess performance across the cluster. | ./kafka-e2e-latency.sh localhost:9092 test-topic 10000 1 20 |
kafka-jmx.sh | Retrieves JMX metrics from Kafka brokers for monitoring and troubleshooting purposes. | ./kafka-jmx.sh --jmx-url service:jmx:rmi:///jndi/rmi://localhost:9999/jmxrmi --object-name "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec" |
kafka-run-class.sh | A utility used to execute Kafka Java classes directly; often used internally by other scripts (e.g., GetOffsetShell). | ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic test-topic --time -1 |