Spark Cluster Mode Overview

Basics

Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program (called the driver program).

flow

Specifically, to run on a cluster, the SparkContext can connect to several types of cluster managers (either Spark’s own standalone cluster manager, Mesos or YARN), which allocate resources across applications. Once connected, Spark acquires executors on nodes in the cluster, which are processes that run computations and store data for your application. Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks to the executors to run.

Architect

在这里插入图片描述

  • Each application gets its own executor processes, which stay up for
    the duration of the whole application and run tasks in multiple
    threads. This has the benefit of isolating applications from each
    other, on both the scheduling side (each driver schedules its own
    tasks) and executor side (tasks from different applications run in
    different JVMs). However, it also means that data cannot be shared
    across different Spark applications (instances of SparkContext)
    without writing it to an external storage system.
  • Spark is agnostic to the underlying cluster manager. As long as it
    can acquire executor processes, and these communicate with each
    other, it is relatively easy to run it even on a cluster manager that
    also supports other applications (e.g. Mesos/YARN).
  • The driver program must listen for and accept incoming connections
    from its executors throughout its lifetime (e.g., see
    spark.driver.port in the network config section). As such, the driver
    program must be network addressable from the worker nodes.
  • Because the driver schedules tasks on the cluster, it should be run
    close to the worker nodes, preferably on the same local area network.
    If you’d like to send requests to the cluster remotely, it’s better
    to open an RPC to the driver and have it submit operations from
    nearby than to run a driver far away from the worker nodes.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值