[Spark总结]Spark Core概述

最新推荐文章于 2024-04-18 04:54:42 发布

Gru杨

最新推荐文章于 2024-04-18 04:54:42 发布

阅读量760

点赞数 1

分类专栏： Spark 文章标签： Spark

本文链接：https://blog.csdn.net/weixin_43517453/article/details/89892717

版权

Spark 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

Spark Core概述

Spark Core相关术语
Spark运行机制

Spark Core实现了 Spark 的基本功能，包含任务调度、内存管理、错误恢复、与存储系统交互等模块。Spark Core 中还包含了对弹性分布式数据集RDD的 API 定义。

Spark Core相关术语

Spark Application

User program built on Spark. Consists of a driver program and executors on the cluster.

Application = a driver + some executors on the cluster

Driver Program

The process running the main() function of the application
 and creating the SparkContext

驱动程序主要完成两个工作：

运行main()
创建 SparkContext

Spark 的驱动器是执行开发程序中的 main 方法的进程。它负责开发人员编写的用来创建 SparkContext、创建 RDD，以及进行 RDD 的转化操作和行动操作代码的执行。如果你是用 spark shell，那么当你启动 Spark shell 的时候，系统后台自启了一个 Spark 驱动器程序，就是在 Spark shell 中预加载的一个叫作 sc 的 SparkContext 对象。如果驱动器程序终
止，那么 Spark 应用也就结束了。主要负责:

将用户程序转化为任务（job）；
在 Executor 之间调度任务（task）；
跟踪 Executor 的执行情况；
通过 UI 展示查询运行情况；

Cluster Manager

cluster manager is an external service for acquiring resources on the cluster

集群管理器主要是去 standalone/Mesos/Yarn/K8S/上申请资源

Worker Node

相当于 Yarn 上的 NodeManager
上面会有很多的 Executor 类似于container容器

Executor

A process launched for an application on a worker node,
that runs tasks and keeps data in memory or disk storage across them.
Each application has its own executors.

一个 Executor 可以运行多个 task

Task

A unit of work that will be sent to one executor

是最小的工作单元，跑在 Executor 上
一个分区就是一个 Task

Job

在Spark上，只要运行一个action就是一个Job。一个Job会包含多个Task

Deploy Mode

Distinguishes where the driver process runs. 
In "cluster" mode, the framework launches the driver inside of the cluster.
In "client" mode, the submitter launches the driver outside of the cluster.

Spark运行机制

Spark的运行架构包括

集群资源管理器（Cluster Manager）
每个应用的任务控制节点（Driver）
运行作业任务的工作节点（Worker Node）
每个工作节点上负责具体任务的执行进程（Executor）

Gru杨

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
[Spark总结]Spark Core概述

Spark CoreSpark Core 概述Spark Core相关术语Spark ApplicationDriverCluster ManagerWorker NodeExecutorTaskJobSpark Core 概述 Spark Core实现了 Spark 的基本功能，包含任务调度、内存管理、错误恢复、与存储系统交互等模块。Spark Core 中还包含了对弹性分布式数据集RDD...
复制链接

扫一扫