Application :
一个spark应用程序 由 1 个driver program + n 个 executors 组成
User program built on Spark.
Consists of a driver program and executors on the cluster.
Driver program
Driver 程序,主要用来运行 spark应用程序中的main 方法,并且创建SparkContext进程
The process running the main() function of the application
creating the SparkContext
Cluster manager
集群管理器 主要就是制定
An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)
spark-submit --master local[2]/spark://hadoop000:7077/yarn
Deploy mode
部署模式 指定 Driver program的运行的位置 ,有两种 方法 cluster
,client
;
cluster
模式 是在
client
模式
Distinguishes where the driver process runs.
In "cluster" mode, the framework launches the driver inside of the cluster.
In "client" mode, the submitter launches the driver outside of the cluster.
Worker node
工作节点 相当于Yarn的 NodeManager,如果是standalone 模式的话,需要修改 slaves配置文件,指定 slave节点
Any node that can run application code in the cluster
Executor
Executor 用于执行任务的进程,和Yarn 的Container 类似,会为自己的task申请 cpu,内存资源,用来执行task
A process launched for an application on a worker node
runs tasks
keeps data in memory or disk storage across them
Each application has its own executors.
Task
task 就是一个需要被Executor 进程执行的工作单元
A unit of work that will be sent to one executor
Job
Job 一个action对应一个job
A parallel computation consisting of multiple tasks that
gets spawned in response to a Spark action (e.g. save, collect);
you'll see this term used in the driver's logs.
Stage
一个stage的边界往往是从某个地方取数据开始,到shuffle的结束
Each job gets divided into smaller sets of tasks called stages
that depend on each other
(similar to the map and reduce stages in MapReduce);
you'll see this term used in the driver's logs.