Home - ignacio-alorre/Spark GitHub Wiki

1- How Spark Works

2- Spark APIs

from existing information related with Spark from other other wikipages

  • Datasets Take parts from DataFrames, Datasets and Spark SQL

  • [RDDs vs Dataframes vs Datasets]

3- Working with Key/Value Data (TODO: Complete the pending part and add images where required, it is still unfinished this topic)

  • The Goldilocks Example
  • Actions on Key/Value Pairs
  • What's so Dangerous About the groupByKey Function
  • Choosing an Aggregation Operation
  • Multiple RDD operations
  • Partitioners and Key/Value Data
  • Dictionary of Ordered RDD operations
  • Secondary sort and repartitionAndSortWithinPartitions
  • Straggler Detection and Unbalanced Data

4- Effective Transformations

5- Joins

6- Interview Questions

  • Block 1
  • Block 2

7- Templates