ICP_02 : Spark Programming - acikgozmehmet/BigDataAnalyticsAndApplications GitHub Wiki

Objectives:

We will focus on installation and getting familiar with Big Data Analytics and Applications programming concepts.

Spark

Spark is an open source cluster computing environment similar to Hadoop, developed at the University of California, Berkeley
- Machine Learning
- Spark Streaming
- Faster Batch
Spark enables in-memory distributed datasets that optimize iterative workloads in addition to interactive queries.
Spark is complementary to Hadoop and can run side by side over the Hadoop file system.
Spark supports to build large-scale and low-latency data analytics applications.

References: