目录
Cluster Mode Overview
集群模式概述
This document gives a short overview of how Spark runs on clusters, to make it easier to understand the components involved.
Read through the application submission guide to learn about launching applications on a cluster.
本文档简要介绍了Spark如何在集群上运行,以便更容易理解所涉及的组件。
阅读应用程序提交指南,了解如何在群集上启动应用程序。
1 Components
1 组件
Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext
object in your main program (called the driver program).
Spark应用程序在集群上作为独立的进程集运行,由主程序中的 SparkContext
对象(称为驱动程序)协调。
Specifically, to run on a cluster, the SparkContext can connect to several types of cluster managers (either Spark’s own standalone cluster manager, Mesos, YARN or Kubernetes), which allocate resources across applications. Once connected, Spark acquires executors on nodes in the cluster, which are processes that run computations and store data for your application. Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks to the executors to run.
具体来说,要在集群上运行,SparkContext可以连接到几种类型的集群管理器(Spark自己的独立集群管理器,Mesos,YARN或Kubernetes),这些集群管理器在应用程序之间分配资源。连接后,Spark会获取集群中节点上的执行器,这些执行器是为应用程序运行计算和存储数据的进程。接下来,它将您的应用程序代码(由传递给SparkContext的Python或Python文件定义)发送给执行器。最后,SparkContext将任务发送给执行器运行。
There are several useful things to note about this architecture:
关于这个架构,有几件有用的事情需要注意:
- Each application gets its own executor processes