Page Index - ignacio-alorre/Spark GitHub Wiki
43 page(s) in this GitHub Wiki:
- Home
- 1- How Spark Works
- 2- Spark APIs
- 3- Working with Key/Value Data (TODO: Complete the pending part and add images where required, it is still unfinished this topic)
- 4- Effective Transformations
- 5- Joins
- 6- Interview Questions
- 7- Templates
- Architecture and Features
- Cache Vs Persist
- Config Parameters
- Dataframe
- DataFrame API
- Dataframe Schema
- Datasets
- Interview Questions
- Interview Questions 3
- Introduction to DataFrames
- Iterator to Iterator Transformations with mapPartitions
- Joins
- Minimizing Object Creation
- Narrow Vs Wide Transformations
- Optimize Spark SQL Joins
- Parallelism and Partitions
- RDD shuffling
- RDD vs DataFrame vs Datasets
- RDDs
- Rename Column on DataFrame
- Reusing RDDs
- Set Operations
- Shared Variables
- Shuffling What it is and why it's important (Coursera)
- Spark Interview Questions II
- Spark Job Scheduling
- Spark Session
- Spark SQL Interview Questions
- Spark SQL random things
- Spark Transformations [TODO: Narrow vs Wide]
- The Anatomy of a Spark Job
- Things which should be fit somewhere
- What Type of RDD Does Your Transformation Return?
- Window Functions
- Working With Key Value Data