Home - satyamsingh1004/spark GitHub Wiki
What Happens when you run the Spark-Submit Command
-
First a Driver program is launched --> Driver program creates a Spark Context/Session --> Spark Context helps create the DAG --> Once an action is encountered a Job is created which are subnitted to DAG scheduler --> DAG scheduler divides the DAG into stages and tasks , tasks are submitted to task scheduler --> Task Scheduler launches the tasks via the Cluster Manager on the different worker nodes --> For each partition of the data, a task is launched --> Once the tasks are completed, the results are shared back to the Driver program
-
Driver Program -> Spark Session & DAG -> For each action a job is craetd and submitted to DAG scheduler -> DAG scheduler creates stages and tasks (for each partition) -> Task Schduler via Cluster Manager launches task on worker nodes -> Results are shared back to driver
-
Driver Program -> Spark Session & DAG -> (DAG Scheduler) Job (for every action) , Stages (wide transformation) , Task (for each partition) ->(Task scheduler & Cluster Manager) Worker nodes -> Results are submitted back to Driver nodes