【spark】【集群模式】【Cluster】【概述】

资源存储库

已于 2024-02-17 19:01:58 修改

阅读量976

点赞数 18

文章标签： spark 大数据分布式

于 2024-02-17 19:01:15 首次发布

本文链接：https://blog.csdn.net/wq6qeg88/article/details/136141023

版权

本文档提供了Spark在集群上运行的概述，包括SparkContext如何协调独立的进程，如何连接到不同的集群管理器，如Standalone、Mesos、YARN或Kubernetes，以及如何提交应用程序、监测和调度作业。

摘要由CSDN通过智能技术生成

Cluster Mode Overview

集群模式概述

1 Components

1 组件

2 Cluster Manager Types

2 集群管理器类型

3 Submitting Applications

Cluster Mode Overview

集群模式概述

This document gives a short overview of how Spark runs on clusters, to make it easier to understand the components involved.

Read through the application submission guide to learn about launching applications on a cluster.
本文档简要介绍了Spark如何在集群上运行，以便更容易理解所涉及的组件。

阅读应用程序提交指南，了解如何在群集上启动应用程序。

1 Components

1 组件

Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program (called the driver program).
Spark应用程序在集群上作为独立的进程集运行，由主程序中的 SparkContext 对象（称为驱动程序）协调。

Specifically, to run on a cluster, the SparkContext can connect to several types of cluster managers (either Spark’s own standalone cluster manager, Mesos, YARN or Kubernetes), which allocate resources across applications. Once connected, Spark acquires executors on nodes in the cluster, which are processes that run computations and store data for your application. Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks to the executors to run.
具体来说，要在集群上运行，SparkContext可以连接到几种类型的集群管理器（Spark自己的独立集群管理器，Mesos，YARN或Kubernetes），这些集群管理器在应用程序之间分配资源。连接后，Spark会获取集群中节点上的执行器，这些执行器是为应用程序运行计算和存储数据的进程。接下来，它将您的应用程序代码（由传递给SparkContext的Python或Python文件定义）发送给执行器。最后，SparkContext将任务发送给执行器运行。

Spark cluster components

There are several useful things to note about this architecture:
关于这个架构，有几件有用的事情需要注意：

Each application gets its own executor processes

最低0.47元/天解锁文章

资源存储库

关注

18
点赞
踩
25

收藏

觉得还不错? 一键收藏
打赏
0
评论
【spark】【集群模式】【Cluster】【概述】

具体来说，要在集群上运行，SparkContext可以连接到几种类型的集群管理器（Spark自己的独立集群管理器，Mesos，YARN或Kubernetes），这些集群管理器在应用程序之间分配资源。连接后，Spark会获取集群中节点上的执行器，这些执行器是为应用程序运行计算和存储数据的进程。Spark可以控制跨应用程序（在集群管理器级别）和应用程序内部（如果在同一SparkContext上发生多个计算）的资源分配。Spark应用程序在集群上作为独立的进程集运行，由主程序中的。监控指南还介绍了其他监控选项。
复制链接

扫一扫