Apache Kafka - sgml/signature GitHub Wiki

Examples

Data Contracts

Schema Registry

CLI

Deduplication

Bindings

https://github.com/didier-wenzek/ocaml-kafka

Glossary

https://docs.confluent.io/kafka-connectors/aws-lambda/current/_glossary.html

Flink

References

JSONPointer Integration

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-java</artifactId>
    <version>1.15.0</version>
</dependency>
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-streaming-java_2.12</artifactId>
    <version>1.15.0</version>
</dependency>
<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-core</artifactId>
    <version>2.13.0</version>
</dependency>
<dependency>
    <groupId>com.github.fge</groupId>
    <artifactId>jackson-coreutils</artifactId>
    <version>1.9</version>
</dependency>

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
import com.fasterxml.jackson.core.JsonPointer;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

public class JsonPathFinder {
    public static void main(String[] args) throws Exception {
        // Set up the streaming execution environment
        var env = StreamExecutionEnvironment.getExecutionEnvironment();

        // Create a sample JSON input stream
        var jsonInputStream = env.fromElements(
                """
                {"foo": "bar"}
                """,
                """
                {"baz": "qux"}
                """
        );

        // Define a FlatMapFunction to find the JSON Pointer path "foo"
        var results = jsonInputStream.flatMap((FlatMapFunction<String, Tuple2<String, Boolean>>) (value, out) -> {
            var mapper = new ObjectMapper();
            var rootNode = mapper.readTree(value);
            var pointer = JsonPointer.compile("/foo");

            var pathExists = !rootNode.at(pointer).isMissingNode();
            out.collect(new Tuple2<>(value, pathExists));
        });

        // Print the results
        results.print();

        // Execute the Flink job
        env.execute("Json Path Finder");
    }
}

Migration

Kafka/Flink for SpringMVC WebFlux Developers

Similar Methods in Apache Kafka

Flux.fromIterable() - Similar to Kafka's KafkaConsumer.poll() which retrieves records from a Kafka topic.
Flux.subscribe() - Similar to Kafka's KafkaConsumer.subscribe() which subscribes the consumer to one or more topics.
Flux.map() - Similar to Kafka's KafkaStreams.map() which transforms records in a stream.
Flux.flatMap() - Similar to Kafka's KafkaStreams.flatMap() which transforms records into multiple records.
Flux.delayElements() - Similar to Kafka's KafkaProducer.send() which can be used with a delay.

Similar Methods in Apache Flink

Flux.fromIterable() - Similar to Flink's DataStream.fromCollection() which creates a DataStream from a collection.
Flux.subscribe() - Similar to Flink's DataStream.addSource() which adds a source to the DataStream.
Flux.map() - Similar to Flink's DataStream.map() which applies a function to each element in the stream.
Flux.flatMap() - Similar to Flink's DataStream.flatMap() which transforms each element into zero or more elements.
Flux.delayElements() - Similar to Flink's DataStream.timeWindow() which introduces a delay or windowing in the stream.

Kafka Protocol

Kafka Protocol Guide on GitHub

Flink Protocol

Kinesis Protocol

AWS Kinesis Documentation

Storm Protocol

Integration

Use Cases

Concepts

Internals

Refactoring

Security

Videos

VS

One of Kafka's core features is the partitioning of data by means of a partition key, which can be used to select data for which the order must be maintained and data which can be processed in parallel.

A Kafka cluster consists of brokers that coordinate the writing (and reading) of data to permanent storage. With Kafka, every message is stored. Communicating via permanent storage decouples the send and receive operations from each other