Apache Spark - AshokBhat/ml GitHub Wiki

About

  • Distributed general-purpose computing framework
  • Addresses the limitations of Hadoop MapReduce. Spark reads data into memory, performs necessary operations, and writes results back—this allows for fast processing time, as opposed to MapReduce where each iteration requires disk read and write.

Architecture

ML on Spark

  • Machine learning frameworks on Spark: Apache Spark’s MLlib, H2O.ai’s Sparkling Water,..
  • DL frameworks on Spark: CERN’s Distributed Keras, Intel’s BigDL, Yahoo’s TensorFlowOnSpark...

See also