Configurations - chandradeepak/cruise-control GitHub Wiki
Configurations inherited from Kafka clients
The following configurations are inherited from the open source Kafka client configurations. They will be used by all the clients in Cruise Control to communicate with the Kafka cluster.
Name | Type | Required? | Default Value | Descriptions |
---|---|---|---|---|
bootstrap.servers | String | Y | The bootstrap.servers of the Kafka cluster that Cruise Control should be managing. This configuration is also used by the SampleStore and CruiseControlMetricsReporterSampler if they are used. |
|
metadata.max.age.config | Long | N | 300,000 | The maximum time to cache the metadata of the Kafka cluster before it has to be refreshed. This configuration is used by all the clients communicating with the Kafka cluster. |
client.id | String | N | kafka-cruise-control | The client id to be used when communicate to brokers for metadata refresh. |
send.buffer.bytes | Integer | N | 128K | The socket send buffer size. This configuration is used by all the clients communicating with the Kafka cluster. |
receive.buffer.bytes | Integer | N | 32K | The socket receive buffer size. This configuration is used by all the clients communicating with the Kafka cluster. |
reconnect.backoff.ms | Integer | N | 50 | The amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all requests sent by the client. This configuration is used by all the clients communicating with the Kafka cluster. |
connection.max.idle.ms | Integer | N | 540,000 | Close idle connections after the number of milliseconds specified by this config. This configuration is used by all the clients communicating with the Kafka cluster. |
request.timeout.ms | Integer | N | 30,000 | The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted. This configuration is used by all the clients communicating with the Kafka cluster. |
security.protocol | String | N | PLAINTEXT | Security protocol used to communicate with brokers. |
ssl.protocol | String | N | TLS | The SSL protocol used to generate the SSLContext. Default setting is TLS, which is fine for most cases. Allowed values in recent JVMs are TLS, TLSv1.1 and TLSv1.2. SSL, SSLv2 and SSLv3 may be supported in older JVMs, but their usage is discouraged due to known security vulnerabilities. |
ssl.provider | String | N | The name of the security provider used for SSL connections. Default value is the default security provider of the JVM. | |
ssl.cipher.suites | String | N | A list of cipher suites. This is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. By default all the available cipher suites are supported. | |
ssl.enabled.protocols | String | N | TLSv1.2,TLSv1.1,TLSv1 | The list of protocols enabled for SSL connections. |
ssl.keystore.type | String | N | JKS | The file format of the key store file. This is optional for client. |
ssl.keystore.location | String | N | The location of the key store file. This is optional for client and can be used for two-way authentication for client. | |
ssl.keystore.password | String | N | The store password for the key store file. This is optional for client and only needed if ssl.keystore.location is configured. | |
ssl.key.password | String | N | The password of the private key in the key store file. This is optional for client. | |
ssl.truststore.type | String | N | JKS | The location of the key store file. This is optional for client and can be used for two-way authentication for client. |
ssl.keystore.password | String | N | The store password for the key store file. ",+ "This is optional for client and only needed if ssl.keystore.location is configured. | |
ssl.key.password | String | N | The password of the private key in the key store file. ",+ "This is optional for client. | |
ssl.truststore.type | String | N | JKS | The file format of the trust store file. |
ssl.truststore.location | String | N | The location of the trust store file. | |
ssl.truststore.password | String | N | The password for the trust store file. | |
ssl.keymanager.algorithm | String | N | SunX509 | The algorithm used by key manager factory for SSL connections. Default value is the key manager factory algorithm configured for the Java Virtual Machine. |
ssl.trustmanager.algorithm | String | N | SunX509 | The algorithm used by trust manager factory for SSL connections. Default value is the trust manager factory algorithm configured for the Java Virtual Machine. |
ssl.endpoint.identification.algorithm | String | N | The endpoint identification algorithm to validate server hostname using server certificate. | |
ssl.secure.random.implementation | String | N | The SecureRandom PRNG implementation to use for SSL cryptography operations. |
Cruise Control Configurations
Load Monitor Configurations
Name | Type | Required? | Default Value | Descriptions |
---|---|---|---|---|
num.metric.fetchers | Integer | N | 1 | The number of metric fetchers to fetch from the Kafka cluster. |
metric.sampler.class | Class | N | com.linkedin.kafka.cruisecontrol.monitor.sampling.CruiseControlMetricsReporterSampler | The class name of the metric sampler |
metric.sampler.partition.assignor.class | Class | N | com.linkedin.kafka.cruisecontrol.monitor.sampling.DefaultMetricSamplerPartitionAssignor | The class used to assign the partitions to the metric samplers. |
metric.sampling.interval.ms | Integer | N | 60,000 | The interval of metric sampling. |
load.snapshot.window.ms | Integer | N | 3,600,000 | The interval in millisecond that is covered by each load snapshot. The load snapshot will aggregate all the metric samples whose timestamp fall into its window. The load snapshot window must be greater than the metric.sampling.interval.ms |
num.load.snapshots | Integer | N | 5 | The maximum number of load snapshots the load monitor would keep. " +,"Each snapshot covers a time window defined by load.snapshot.window.ms |
min.samples.per.load.snapshot | Integer | N | 3 | The minimum number of metric samples a valid load snapshot should have. If a partition does not have enough samples in a snapshot window, the topic of the partition will be removed from the snapshot due to in sufficient data. |
broker.capacity.config.resolver.class | Class | N | com.linkedin.kafka.cruisecontrol.config.BrokerCapacityConfigFileResolver | The broker capacity configuration resolver class name. The broker capacity configuration resolver is responsible for getting the broker capacity. The default implementation is a file based solution. |
min.monitored.partition.percentage | Double | N | 0.995 | The minimum percentage of the total partitions required to be monitored in order to generate a valid load model. Because the topic and partitions in a Kafka cluster are dynamically changing. The load monitor will exclude some of the topics that does not have sufficient metric samples. This configuration defines the minimum required percentage of the partitions that must be included in the load model. |
leader.network.inbound.weight.for.cpu.util | Double | N | 0.6 | Kafka Cruise Control uses the following model to derive replica level CPU utilization: REPLICA_CPU_UTIL = a * LEADER_BYTES_IN_RATE + b * LEADER_BYTES_OUT_RATE + c * FOLLOWER_BYTES_IN_RATE. This configuration will be used as the weight for LEADER_BYTES_IN_RATE. |
leader.network.outbound.weight.for.cpu.util | Double | N | 0.1 | Kafka Cruise Control uses the following model to derive replica level CPU utilization: REPLICA_CPU_UTIL = a * LEADER_BYTES_IN_RATE + b * LEADER_BYTES_OUT_RATE + c * FOLLOWER_BYTES_IN_RATE. This configuration will be used as the weight for LEADER_BYTES_OUT_RATE. |
follower.network.inbound.weight.for.cpu.util | Double | N | 0.3 | Kafka Cruise Control uses the following model to derive replica level CPU utilization: REPLICA_CPU_UTIL = a * LEADER_BYTES_IN_RATE + b * LEADER_BYTES_OUT_RATE + c * FOLLOWER_BYTES_IN_RATE. This configuration will be used as the weight for FOLLOWER_BYTES_IN_RATE. |
sample.store.class | Class | N | com.linkedin.kafka.cruisecontrol.monitor.sampling.KafkaSampleStore | The sample store class name. User may configure a sample store that persist the metric samples that have already been aggregated into Kafka Cruise Control. Later on the persisted samples can be reloaded from the sample store to Kafka Cruise Control. |
Analyzer Configurations
Name | Type | Required? | Default Value | Descriptions |
---|---|---|---|---|
cpu.balance.threshold | Double | N | 1.1 | The maximum allowed extent of unbalance for CPU utilization. For example, 1.10 means the highest CPU usage of a broker should not be above 1.10x of average CPU utilization of all the brokers. |
disk.balance.threshold | Double | N | 1.1 | The maximum allowed extent of unbalance for disk utilization. For example, 1.10 means the highest disk usage of a broker should not be above 1.10x of average disk utilization of all the brokers. |
network.inbound.balance.threshold | Double | N | 1.1 | The maximum allowed extent of unbalance for network inbound usage. For example, 1.10 means the highest network inbound usage of a broker should not be above 1.10x of average network inbound usage of all the brokers. |
network.outbound.balance.threshold | Double | N | 1.1 | The maximum percentage of the total broker.cpu.capacity that is allowed to be used on a broker. The analyzer will enforce a hard goal that the cpu utilization of a broker cannot be higher than (broker.cpu.capacity * cpu.capacity.threshold). |
disk.capacity.threshold | Double | N | 0.8 | The maximum percentage of the total broker.disk.capacity that is allowed to be used on a broker. The analyzer will enforce a hard goal that the disk usage of a broker cannot be higher than (broker.disk.capacity * disk.capacity.threshold). |
network.inbound.capacity.threshold | Double | N | 0.8 | The maximum percentage of the total broker.network.inbound.capacity that is allowed to be used on a broker. The analyzer will enforce a hard goal that the disk usage of a broker cannot be higher than (broker.network.inbound.capacity * network.inbound.capacity.threshold). |
network.outbound.capacity.threshold | Double | N | 0.8 | The maximum percentage of the total broker.network.outbound.capacity that is allowed to be used on a broker. The analyzer will enforce a hard goal that the disk usage of a broker cannot be higher than (broker.network.outbound.capacity * network.outbound.capacity.threshold). |
cpu.low.utilization.threshold | Double | N | 0.3 | The threshold for Kafka Cruise Control to define the utilization of CPU is low enough that rebalance is not worthwhile. The cluster will only be in a low utilization state when all the brokers are below the low utilization threshold. The threshold is in percentage. |
disk.low.utilization.threshold | Double | N | 0.3 | The threshold for Kafka Cruise Control to define the utilization of DISK is low enough that rebalance is not worthwhile. The cluster will only be in a low utilization state when all the brokers are below the low utilization threshold. The threshold is in percentage. |
network.inbound.low.utilization.threshold | Double | N | 0.3 | The threshold for Kafka Cruise Control to define the utilization of network inbound rate is low enough that rebalance is not worthwhile. The cluster will only be in a low utilization state when all the brokers are below the low utilization threshold. The threshold is in percentage. |
network.outbound.low.utilization.threshold | Double | N | 0.3 | The threshold for Kafka Cruise Control to define the utilization of network outbound rate is low enough that rebalance is not worthwhile. The cluster will only be in a low utilization state when all the brokers are below the low utilization threshold. The threshold is in percentage. |
max.proposal.candidates | Integer | N | 10 | Kafka cruise control precomputes the optimization proposal candidates continuously in the background. This config sets the maximum number of candidate proposals to precompute for each cluster workload model. The more proposal candidates are generated, the more likely a better optimization proposal will be found, but more CPU will be used as well. |
proposal.expiration.ms | Integer | N | 900,000 | Kafka cruise control will cache one of the best proposal among all the optimization proposal candidates it recently computed. This configuration defines when will the cached proposal be invalidated and needs a recomputation. If proposal.expiration.ms is set to 0, cruise control will continuously compute the proposal candidates. |
num.proposal.precompute.threads | Integer | N | 1 | The number of thread used to precompute the optimization proposal candidates. The more threads are used, the more memory and CPU resource will be used. |
Executor Configurations
Name | Type | Required? | Default Value | Descriptions |
---|---|---|---|---|
zookeeper.connect | String | Y | The zookeeper path used by the Kafka cluster. | |
num.concurrent.partition.movements.per.broker | Integer | N | 10 | The maximum number of partitions the executor will move to or out of a broker at the same time. e.g. setting the value to 10 means that the executor will at most allow 10 partitions move out of a broker and 10 partitions move into a broker at any given point. This is to avoid overwhelming the cluster by partition movements. |
num.concurrent.leader.movements | Integer | N | 1000 | The maximum number of leader movements the executor will take as one batch. This is mainly because the ZNode has a 1 MB size upper limit. And it will also reduce the controller burden. |
execution.progress.check.interval.ms | Integer | N | 10,000 | The interval in milliseconds that the " +,"executor will check on the execution progress. |
goals | List | N | com.linkedin.kafka.cruisecontrol.analyzer.RackAwareCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.PotentialNwOutGoal, com.linkedin.kafka.cruisecontrol.analyzer.ResourceDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.LeaderBytesInDistributionGoals, com.linkedin.kafka.cruisecontrol.analyzer.TopicReplicaDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.ReplicaDistributionGoal | A list of goals in the order of priority. The high priority goals will be executed first. |
anomaly.notifier.class | Class | N | com.linkedin.kafka.cruisecontrol.detector.notifier.NoopNotifier | The notifier class to trigger an alert when an anomaly is violated. The anomaly could be either a goal violation or a broker failure. |
anomaly.detection.interval.ms | Long | N | 300000 | The interval in millisecond that the detectors will run to detect the anomalies. |
anomaly.detection.goals | List | N | com.linkedin.kafka.cruisecontrol.analyzer.RackAwareCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.PotentialNwOutGoal, com.linkedin.kafka.cruisecontrol.analyzer.ResourceDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.LeaderBytesInDistributionGoals, com.linkedin.kafka.cruisecontrol.analyzer.TopicReplicaDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.ReplicaDistributionGoal | The goals that anomaly detector should detect if they are violated. |
failed.brokers.zk.path | String | N | /CruiseControlBrokerList | The zk path to store the failed broker list. This is to persist the broker failure time in case Cruise Control failed and restarted when some brokers are down. |
topics.excluded.from.partition.movement | String | N | "" | The topics that should be excluded from the partition movement. It is a regex. Notice that this regex will be ignored when decommission a broker is invoked. |
Configurations under development and testing
We are still trying to improve cruise control. And following are some configurations that are for development and experiment.
Name | Type | Required? | Default Value | Description |
---|---|---|---|---|
use.linear.regression.model | Boolean | N | false | Whether to use the linear regression model to predict the broker CPU utilization. |
linear.regression.model.cpu.util.bucket.size | Integer | N | 5 | The CPU utilization bucket size for linear regression model training data. The unit is percents. |
linear.regression.model.required.samples.per.bucket | Integer | N | 100 | The number of training samples required in each CPU utilization bucket specified by linear.regression.model.cpu.util.bucket |
linear.regression.model.min.num.cpu.util.buckets | Integer | N | 5 | The minimum number of full CPU utilization buckets required to generate a linear regression model. |
Configurations of pluggable classes
CruiseControlMetricsReporterSampler configurations
Name | Type | Required? | Default Value | Description |
---|---|---|---|---|
metric.reporter.sampler.bootstrap.servers | String | N | The same as bootstrap.servers config from Cruise Control |
The Kafka cluster to consume the interested metrics collected by CruiseControlMetricsReporter. |
metric.reporter.topic.pattern | String | N | "__CruiseControlMetrics" | The regex which allows users to specify multiple matching topics from which the sampler should be consuming the interested metrics from. |
metric.reporter.sampler.group.id | String | N | 60,000 | The consumer group id to use for the consumers to consume from the Kafka cluster. |
KafkaSampleStore configurations
Name | Type | Required? | Default Value | Description |
---|---|---|---|---|
partition.metric.sample.store.topic | String | Y | The topic in which Cruise Control will store its processed metric samples as a backup. When Cruise Control is rebooted, it will load the metrics from this topic to populate the load monitor. | |
broker.metric.sample.store.topic | String | Y | The topic in which Cruise Control will store its broker metric samples as a backup. When Cruise Control is rebooted, it will load the broker metric samples from this topic to train its cluster model. | |
num.sample.loading.threadsc | Integer | N | 8 | The number of threads to load from the sample store topics |
BrokerCapacityConfigurationFileResolver configurations
Name | Type | Required? | Default Value | Description |
---|---|---|---|---|
capacity.config.file | String | Y | The path to the configuration JSON file that provides the capacity of the brokers. |
SelfHealingNotifer configurations
Name | Type | Required? | Default Value | Description |
---|---|---|---|---|
broker.failure.alert.threshold.ms | Long | N | 900,000 | Defines the threshold to mark a broker as dead. If a non-empty broker leaves the cluster at time T and did not join the cluster before T + broker.failure.alert.threshold.ms, the broker is defined as dead broker since T. An alert will be triggered in this case. |
broker.failure.self.healing.threshold.ms | Long | N | 1,800,000 | If self-healing is enabled and a broker is dead at T,,self-healing will be triggered at T + broker.failure.self.healing.threshold.ms. |
self.healing.enabled | Boolean | N | true | Whether enable self-healing or not. If disabled, the SelfHealingNotifier will only log the broker failure or goal violation, but not take a fix action. |
CruiseControlMetricsReporter Configurations
Name | Type | Required? | Default Value | Description |
---|---|---|---|---|
cruise.control.metrics.topic | String | N | "__CruiseControlMetrics" | The topic to which CruiseControlMetricsReporter will produce the interested metrics. The metrics can be consumed by com.linkedin.kafka.cruisecontrol.monitor.sampling.CruiseControlMetricsReporterSampler to derive the partition level workload. |
cruise.control.metrics.reporter.bootstrap.servers | String | Y | The Kafka cluster to which CruiseControlMetricsReporter should produce the interested metrics. It is usually just the hosting Kafka cluster where the metrics reporter is running, but users can choose to produce to another cluster if they want to. | |
cruise.control.metrics.reporter.metrics.reporting.interval.ms | Long | N | 60,000 | The interval of collecting and sending the interested metrics. |
Besides the above configurations, CruiseControlMetricsReporter takes all the configurations for vanilla KafkaProducer with a prefix of "cruise.control.metrics.reporter."