ICP_02 : Spark Programming - acikgozmehmet/BigDataAnalyticsAndApplications GitHub Wiki
Objectives:
We will focus on installation and getting familiar with Big Data Analytics and Applications programming concepts.
Spark
- Spark is an open source cluster computing environment similar to Hadoop, developed at the University of California, Berkeley
- Machine Learning
- Spark Streaming
- Faster Batch
- Spark enables in-memory distributed datasets that optimize iterative workloads in addition to interactive queries.
- Spark is complementary to Hadoop and can run side by side over the Hadoop file system.
- Spark supports to build large-scale and low-latency data analytics applications.
In Class Programming
1. Spark Integration with Colab (or IDE that you are using)
2. Creating a well commented Spark program and outputting the correct results and writing it to output file.
Results
Recording
Please click on the link to see the recording
References: