Spark: cluters architecture

最新推荐文章于 2024-05-21 11:40:29 发布

ylzhjlinux

最新推荐文章于 2024-05-21 11:40:29 发布

阅读量121

点赞数

分类专栏： Spark 文章标签：大数据 java

本文链接：https://blog.csdn.net/ylzhjlinux/article/details/84713884

版权

Spark 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

In distributed mode, Spark uses a master/slave architecture with one central coordinator and many distributed workers. The central coordinator is called the driver.The driver communicates with a potentially large number of distributed workers called executors.The driver runs in its own Java process and each executor is a separate Java process. A driver and its executors are together termed a Spark application.

The Driver

Converting a user program into tasks
Scheduling tasks on executors

Executors

run the tasks that make up the application and return results to the driver
provide in-memory storage for RDDs that are cached by user programs

Cluster Manager

Spark depends on a cluster manager to launch executors and,in certain cases, to launch the driver.The cluster manager is a pluggable component in Spark.This allows Spark to run on top of different external managers,such as YARN and Mesos,as well as its built-in Standalone cluster manager.
Spark can run both drivers and executors on the YARN worker nodes.

The procdure of run a spark application

The user submits an application using spark-submit.
spark-submit launches the driver program and invokes the main() method specified by the user.
The driver program contacts the cluster manager to ask for resources to launch executors.
The cluster manager launches executors on behalf of the driver program.
The driver process runs through the user application.Based on the RDD actions and transformations in the program,the driver sends work to executors in the form of tasks.
Tasks are run on executor processes to compute and save results.
If the driver’s main() method exits or it calls SparkContext.stop(),it will terminate the executors and release resources from the cluster manager.

Preferences

<<Learning Spark>>

ylzhjlinux

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spark: cluters architecture

In distributed mode, Spark uses a master/slave architecture with one central coordinator and many distributed workers. The central coordinator is called the driver.The driver communicates with a p...
复制链接

扫一扫