Spark核心概述术语

9 篇文章 0 订阅

Application :

一个spark应用程序 由 1 个driver program + n 个 executors 组成

User program built on Spark. 
Consists of a driver program and executors on the cluster.

Driver program

Driver 程序,主要用来运行 spark应用程序中的main 方法,并且创建SparkContext进程

The process running the main() function of the application 
creating the SparkContext	

Cluster manager

集群管理器 主要就是制定

An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)	
spark-submit --master local[2]/spark://hadoop000:7077/yarn

Deploy mode

部署模式 指定 Driver program的运行的位置 ,有两种 方法 cluster ,client;
cluster 模式 是在
client 模式

Distinguishes where the driver process runs. 
	In "cluster" mode, the framework launches the driver inside of the cluster. 
	In "client" mode, the submitter launches the driver outside of the cluster.	

Worker node

工作节点 相当于Yarn的 NodeManager,如果是standalone 模式的话,需要修改 slaves配置文件,指定 slave节点

Any node that can run application code in the cluster

Executor

Executor 用于执行任务的进程,和Yarn 的Container 类似,会为自己的task申请 cpu,内存资源,用来执行task

A process launched for an application on a worker node
runs tasks 
keeps data in memory or disk storage across them
Each application has its own executors.	

Task

task 就是一个需要被Executor 进程执行的工作单元

A unit of work that will be sent to one executor	

Job

Job 一个action对应一个job

A parallel computation consisting of multiple tasks that 
gets spawned in response to a Spark action (e.g. save, collect); 
you'll see this term used in the driver's logs.

Stage

一个stage的边界往往是从某个地方取数据开始,到shuffle的结束

Each job gets divided into smaller sets of tasks called stages 
that depend on each other
(similar to the map and reduce stages in MapReduce); 
you'll see this term used in the driver's logs.	
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值