Apache Spark™ Learning Resources - datacouch-io/spark-java GitHub Wiki
Welcome to the Apache Spark™ Learning Resources repository! This project is a collection of materials designed to help you understand and work with Apache Spark™ efficiently. Below is an index of the contents available:
Table of Contents
-
- Get started with Apache Spark™ and understand its key features.
-
Resilient Distributed Dataset (RDD)
- Dive deep into the concept of RDD, a fundamental data structure in Spark.
-
Spark Context and Spark Session
- Understand the Spark Context and Spark Session, crucial components in Spark applications.
-
- Explore various operations you can perform on RDDs, such as transformations and actions.
-
Exploring Data Using RDD Operations
- Learn the essentials of RDD, including its creation and transformation.
-
- Delve into Pair RDD operations, which are essential for key-value pair data processing.
-
- In this lab, we will look at several transformations and examine the optimizations and visualise with DAG.
-
Spark SQL
-
Advanced DataFrame
These resources aim to provide you with a comprehensive understanding of Apache Spark™, from its basics to more advanced topics. Feel free to explore each section as you embark on your journey to master Spark. Happy learning!