In distributed mode, Spark uses a master/slave architecture with one central coordinator and many distributed workers. The central coordinator is called the driver.The driver communicates with a potentially large number of distributed workers called executors.The driver runs in its own Java process and each executor is a separate Java process. A driver and its executors are together termed a Spark application.
The Driver
- Converting a user program into tasks
- Scheduling tasks on executors
Executors
- run the tasks that make up the application and return results to the driver
- provide in-memory storage for RDDs that are cached by user programs
Cluster Manager
Spark depends on a cluster manager to launch executors and,in certain cases, to launch the driver.The cluster manager is a pluggable component in Spark.This allows Spark to run on top of different external managers,such as YARN and Mesos,as well as its built-in Standalone cluster manager.
Spark can run both drivers and executors on the YARN worker nodes.
The procdure of run a spark application
- The user submits an application using spark-submit.
- spark-submit launches the driver program and invokes the main() method specified by the user.
- The driver program contacts the cluster manager to ask for resources to launch executors.
- The cluster manager launches executors on behalf of the driver program.
- The driver process runs through the user application.Based on the RDD actions and transformations in the program,the driver sends work to executors in the form of tasks.
- Tasks are run on executor processes to compute and save results.
- If the driver’s main() method exits or it calls SparkContext.stop(),it will terminate the executors and release resources from the cluster manager.
Preferences
<<Learning Spark>>