Thursday, May 11, 2017

Spark Execution Flow

Spark Execution Flow:

Below are the 3 Stages of Spark Execution Model:

1. Create DAG of RDDs to represent computation . - RDD Lineage creation
2. Create logical execution plan for DAG. - 
Split into “stages” based on need to reorganize data 
Stage 1 HadoopRDD map() 
Stage 2 groupBy() mapValues() collect() 
3. Schedule and execute individual tasks.
Split each stage into tasks • A task is data + computation • Execute all tasks within a stage 

For more info Refer "https://spark-summit.org/2014/wp-content/uploads/2014/07/A-Deeper-Understanding-of-Spark-Internals-Aaron-Davidson.pdf"  

2 comments:

  1. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    https://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/

    ReplyDelete
  2. very nice blog.,keep sharing more blogs with us.
    Thank you..
    Intrested candidate visit now:big data and hadoop online training

    ReplyDelete