Spark核心概述术语

最新推荐文章于 2022-07-18 10:48:39 发布

苏先生_404

最新推荐文章于 2022-07-18 10:48:39 发布

阅读量128

点赞数

分类专栏： Spark Spark 文章标签： Spark

本文链接：https://blog.csdn.net/weixin_40420525/article/details/85128922

版权

Spark 同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

Spark

9 篇文章 0 订阅

订阅专栏

Application ：

一个spark应用程序由 1 个driver program + n 个 executors 组成

User program built on Spark. 
Consists of a driver program and executors on the cluster.

Driver program

Driver 程序,主要用来运行 spark应用程序中的main 方法,并且创建SparkContext进程

The process running the main() function of the application 
creating the SparkContext

Cluster manager

集群管理器主要就是制定

An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)	
spark-submit --master local[2]/spark://hadoop000:7077/yarn

Deploy mode

部署模式指定 Driver program的运行的位置 ,有两种方法 cluster ,client;
cluster 模式是在
client 模式

Distinguishes where the driver process runs. 
	In "cluster" mode, the framework launches the driver inside of the cluster. 
	In "client" mode, the submitter launches the driver outside of the cluster.

Worker node

工作节点相当于Yarn的 NodeManager,如果是standalone 模式的话,需要修改 slaves配置文件,指定 slave节点

Any node that can run application code in the cluster

Executor

Executor 用于执行任务的进程,和Yarn 的Container 类似,会为自己的task申请 cpu,内存资源,用来执行task

A process launched for an application on a worker node
runs tasks 
keeps data in memory or disk storage across them
Each application has its own executors.

Task

task 就是一个需要被Executor 进程执行的工作单元

A unit of work that will be sent to one executor

Job

Job 一个action对应一个job

A parallel computation consisting of multiple tasks that 
gets spawned in response to a Spark action (e.g. save, collect); 
you'll see this term used in the driver's logs.

Stage

一个stage的边界往往是从某个地方取数据开始，到shuffle的结束

Each job gets divided into smaller sets of tasks called stages 
that depend on each other
(similar to the map and reduce stages in MapReduce); 
you'll see this term used in the driver's logs.

苏先生_404

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spark核心概述术语

Application ：一个spark应用程序由 1 个driver program + n 个 executors 组成User program built on Spark. Consists of a driver program and executors on the cluster.Driver programDriver 程序,主要用来运行 spark应用程序中的ma...
复制链接

扫一扫

专栏目录