AmazonMSK - keshavbaweja-git/guides GitHub Wiki

M5 brokers have higher baseline throughput performance than T3 brokers and are recommended for production workloads. M5 brokers can also have more partitions per broker than T3 brokers. Use M5 brokers if you are running larger production-grade workloads or require a greater number of partitions.

T3 brokers have the ability to use CPU credits to temporarily burst performance. Use T3 brokers for low-cost development, if you are testing small to medium streaming workloads, or if you have low-throughput streaming workloads that experience temporary spikes in throughput. We recommend that you run a proof-of-concept test to determine if T3 brokers are sufficient for production or critical workload.

Specify up to 3 subnets, each hosted in a different AZ.

Maximum no. of brokers per cluster : 30

After you create the cluster, you can increase the storage volume per broker but you can't decrease it. If you need to reduce the size of your cluster storage, you must migrate your existing cluster to a cluster with smaller storage.

By default, MSK encrypts data as it transits between brokers within a cluster.

Recommended Storage Utilization Target is between 50% and 60%. Maximum storage capacity per broker : 16 TiB

When the service detects that your Maximum Disk Utilization metric is equal to or greater than the Storage Utilization Target setting, it will increase your storage capacity automatically. Amazon MSK first expands your cluster storage by an amount equal to the larger of two numbers: 10 GiB and 10% of current storage. For example, if you have 1000 GiB, that amount is 100 GiB.

Storage scaling has a cool-down period of at least six hours between events. Even though the operation makes additional storage available right away, the service performs optimizations on your cluster that can take up to 24 hours or more. The duration of these optimizations is proportional to your storage size.

You can scale your MSK cluster on demand by changing the type (the size or family) of your brokers without reassigning Apache Kafka partitions. Changing the type of your brokers gives you the flexibility to adjust your MSK cluster’s compute capacity based on changes in your workloads, without interrupting your cluster I/O. Amazon MSK uses the same broker type for all the brokers in a given cluster.

The broker-type update happens in a rolling fashion while the cluster is up and running. This means that Amazon MSK takes down one broker at a time to perform the broker-type update.

During a broker-type update, you can continue to produce and consume data. However, you must wait until the update is done before you can reboot brokers or invoke any of the update operations listed under Amazon MSK operations.

Authentication Client-Broker Encryption Broker-Broker Encryption
Unauthenticated TLS, PLAINTEXT, TLS_PLAINTEXT Can be on or off
mTLS TLS, TLS_PLAINTEXT Must be on
SASL/SCRAM TLS Must be on
SASL/IAM TLS Must be on

MSK Serverless is a cluster type for Amazon MSK that makes it possible for you to run Apache Kafka without having to manage and scale cluster capacity. It automatically provisions and scales capacity while managing the partitions in your topic, so you can stream data without thinking about right-sizing or scaling clusters. MSK Serverless offers a throughput-based pricing model, so you pay only for what you use. Consider using a serverless cluster if your applications need on-demand streaming capacity that scales up and down automatically.

MSK Connect is a feature of Amazon MSK that makes it easy for developers to stream data to and from their Apache Kafka clusters. MSK Connect uses Kafka Connect 2.7.1, an open-source framework for connecting Apache Kafka clusters with external systems such as databases, search indexes, and file systems. With MSK Connect, you can deploy fully managed connectors built for Kafka Connect that move data into or pull data from popular data stores like Amazon S3 and Amazon OpenSearch Service. You can deploy connectors developed by 3rd parties like Debezium for streaming change logs from databases into an Apache Kafka cluster, or deploy an existing connector with no code changes. Connectors automatically scale to adjust for changes in load and you pay only for the resources that you use.

Use source connectors to import data from external systems into your topics. With sink connectors, you can export data from your topics to external systems.

MSK Connect supports connectors for any Apache Kafka cluster with connectivity to an Amazon VPC, whether it is an MSK cluster or an independently hosted Apache Kafka cluster.

MSK Connect continuously monitors connector health and delivery state, patches and manages the underlying hardware, and autoscales the connectors to match changes in throughput.

Best practices https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html

Ensure that the replication factor (RF) is at least 2 for two-AZ clusters and at least 3 for three-AZ clusters. An RF of 1 can lead to offline partitions during a rolling update.

Set minimum in-sync replicas (minISR) to at most RF - 1. A minISR that is equal to the RF can prevent producing to the cluster during a rolling update. A minISR of 2 allows three-way replicated topics to be available when one replica is offline.

Ensure client connection strings include multiple brokers. Having multiple brokers in a client’s connection string allows for failover when a specific broker is offline for an update.

Amazon MSK strongly recommends that you maintain the total CPU utilization for your brokers under 60%. Total CPU utilization is the sum of the CpuUser and CpuSystem metrics. When you have at least 40% of your cluster’s total CPU available, Apache Kafka can redistribute CPU load across brokers in the cluster when necessary. One example of when this is necessary is when Amazon MSK detects and recovers from a broker fault; in this case, Amazon MSK performs automatic maintenance, like patching. Another example is when a user requests a broker-type change or version upgrade; in these two cases, Amazon MSK deploys rolling workflows that take one broker offline at a time. When brokers with lead partitions go offline, Apache Kafka reassigns partition leadership to redistribute work to other brokers in the cluster.

Amazon MSK Serverless provides a serverless runtime environment that simplifies running analytics applications using the latest open source frameworks such as Apache Spark, Hive and Presto. With Amazon MSK Serverless, customers do not have to configure, optimize, secure, or operate clusters to un applications with these frameworks. Additionally, customers do not have to worry about over or under-provisioning resources for their data processing jobs. Amazon MSK Serverless automatically determines the resources required by the application, acquires the resources to process jobs, and relinquishes the resources when jobs finish. For use cases where applications require a sub-second response time such as interactive data analysis, customers can pre-initialize resources required by their application when the application starts up. With Amazon MSK Serverless, customers continue to get the benefits of EMR such as open source compatibility and currency, performance optimized runtime for popular open source frameworks such as Spark and Presto, integration with S3 data lakes and EMR Studio IDE for developing and debugging applications. Compared to standard EMR where customers manage their own clusters and choose the type of EC2 instance type and customize AMI, instance configuration etc., Amazon MSK Serverless is suitable for customers that want to avoid managing and operating clusters and simply want to run applications using open source frameworks.