Apache Flink vs Apache Spark - sambos/Architectures GitHub Wiki

Apache Flink vs Apache Spark

Apache Spark has been a leader (and still) in batch/micro batch (Near realtime) processing work loads, it provides DStream abstraction to micro batches of RDD data -- but its not a complete stream representation of data in real sense. In Spark Streaming is special kind of processing on top of batch.

RDD are Fault-Tolerant and can reconstruct the state after failure

Apache Flink implements actual stream processing from groud up. For Flink Batch is a special kind of processing on top of streaming (does not use micro batching)

Ideal for real streaming applications (complex stream processing)
Also has a custom memory management - see Flink GC management using Bits & Bytes
Lower latency and higher througput
Windowing - More powerful set of window operations copared to Spark
Exactly-Once processing guarantees [there is a switch to downgrade the guarantees to at-least-once]
Provides Fault-Tolerance by takeing consistent snapshots of the distributed data stream and operator state.
- Snapshots are checkpoints that flink can fallback to in case of failure