Kafka - acehippo/slowmotion GitHub Wiki

References

# run zookeeper and kafka
$ docker run -d -p 2181:2181 --name zookeeper jplock/zookeeper
$ docker run -d --name kafka --link zookeeper:zookeeper ches/kafka

# get zookeeper and kafka ip
$ ZK_IP=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' zookeeper)
$ KAFKA_IP=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' kafka)

# create topic
$ docker run --rm ches/kafka kafka-console-consumer.sh --topic test --from-beginning --zookeeper $ZK_IP:2181
# produce datas
$ docker run --rm --interactive ches/kafka kafka-console-producer.sh --topic test --broker-list $KAFKA_IP:9092
# consume datas
$ docker run --rm ches/kafka kafka-console-consumer.sh --topic test --from-beginning --zookeeper $ZK_IP:2181

What is Kafka good for?

It gets used for two broad classes of application:

  1. Building real-time streaming data pipelines that reliably get data between systems or applications
  2. Building real-time streaming applications that transform or react to the streams of data

APIs

Kafka has four core APIs:

  • The Producer API allows an application to publish a stream records to one or more Kafka topics.
  • The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.
  • The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
  • The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.

Topics

  • A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.
  • The Kafka cluster retains all published records—whether or not they have been consumed—using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so storing data for a long time is not a problem.

Consumers