Apache Flink vs Apache Spark - sambos/Architectures GitHub Wiki
Apache Flink vs Apache Spark
Apache Spark has been a leader (and still) in batch/micro batch (Near realtime) processing work loads, it provides DStream abstraction to micro batches of RDD data -- but its not a complete stream representation of data in real sense. In Spark Streaming is special kind of processing on top of batch.
- RDD are Fault-Tolerant and can reconstruct the state after failure
Apache Flink implements actual stream processing from groud up. For Flink Batch is a special kind of processing on top of streaming (does not use micro batching)
- Ideal for real streaming applications (complex stream processing)
- Also has a custom memory management - see Flink GC management using Bits & Bytes
- Lower latency and higher througput
- Windowing - More powerful set of window operations copared to Spark
- Exactly-Once processing guarantees [there is a switch to downgrade the guarantees to at-least-once]
- Provides Fault-Tolerance by takeing consistent snapshots of the distributed data stream and operator state.
- Snapshots are checkpoints that flink can fallback to in case of failure