Hadoop and Spark by Leela Prasad: Spark Execution Flow

Thursday, May 11, 2017

Below are the 3 Stages of Spark Execution Model:

1. Create DAG of RDDs to represent computation . - RDD Lineage creation

2. Create logical execution plan for DAG. -

Split into “stages” based on need to reorganize data

Stage 1 HadoopRDD map()

Stage 2 groupBy() mapValues() collect()

3. Schedule and execute individual tasks.

Split each stage into tasks • A task is data + computation • Execute all tasks within a stage

For more info Refer "https://spark-summit.org/2014/wp-content/uploads/2014/07/A-Deeper-Understanding-of-Spark-Internals-Aaron-Davidson.pdf"