Kafka Performance Optimization - keshavbaweja-git/guides GitHub Wiki

Performance Optimization

Performance dimensions

  • Throughput (messages/bytes per sec)
  • Latency (end to end time: producer => broker => consumer)
  • Durability
  • Availability

Throughput

  • Number of partitions

Partition is the unit of parallelism in a Kafka set up. Both the producer and broker can producer to & persist messages respectively from multiple partitions in parallel. On the consumer side, each partition is processed by a single thread in a consumer group. Higher number of partitions will provide higher throughput, however a very large number of partitions can result in longer recovery times in case of leader failures. Also with higher number of partitions it takes longer for a message to be replicated to followers. Follower brokers by default use only one thread per source broker for partition replication.Guidelines

  • Batch size batch.size (Producer)

Batch size in bytes. Produce requests contain one batch per partition. Higher batch size results in higher throughput. However it can adversely impact latency when records are produced at a slow rate, with larger time required to fill up the batch. Use linger.ms to cap the amount of time producer will wait for batch to fill up.

  • Compression compression.type (Producer, Broker)

Compression is applied at batch level, efficacy of batching will determine efficacy of compression. Batch compression will result in higher throughput, it will add to the message latency though. When a compressed batch arrives at broker, the broker decompresses the batch to validate messages. If a compression codec is specified on the topic and it is different from the codec used by producer, broker will have to recompress the batch, this adds to overall processing time and end to end latency.

  • Number of acknowledgements acks (Producer)

Possible values - 0 (no durability guarantees), 1 (leader acknowledgement), all (all ISR)

  • fetch.min.bytes (Consumer)

Use fetch.max.wait.ms together with this configuration to set an upper bound on wait time for read request.


Latency

  • Number of fetcher threads num.replica.fetchers (Broker)

The number of threads a follower broker uses to replicate partitions from a source broker. A higher number increases degree of rate of replication and reduces latency.

Producer

  • linger.ms Recommended 0, default 0

  • compression.type Recommended none, default none

  • acks Recommended 1, default 1

Consumer

  • fetch.min.bytes, default 1

  • fetch.max.wait.ms

Durability

The most important feature that enables Durability in Kafka is Replication, it ensures that messages are available on more than one broker.

Broker

  • replication.factor, recommended 3
  • default.replication.factor=3 (default 1)
  • auto.create.topics.enable=false (default true)
  • min.insync.replicas=2 (default 1); topic override available
  • unclean.leader.election.enable=false (default false); topic override available
  • broker.rack: rack of the broker (default null)
  • log.flush.interval.messages, log.flush.interval.ms: for topics with very low throughput

Producer

  • acks - number of ISRs that need to replicate and ack a message for the leader broker to mark the message as committed. Recommeded all
  • Configure Producers to resend messages in case of send failures. retries - number of retries, recommended MAX_INT. delivery.timeout.ms - retry duration.
  • enable.idempotence=true (default false), to handle message duplication and ordering
  • max.in.flight.requests.per.connection=1 (default 5), to prevent out of order messages when not using an idempotent producer