spark任务调度

Job Scheduling

Overview

Spark has several facilities for scheduling resources between computations. First, recall that, as described in the cluster mode overview, each Spark application (instance of SparkContext) runs an independent set of executor processes. The cluster managers that Spark runs on provide facilities for scheduling across applications. Second, within each Spark application, multiple “jobs” (Spark actions) may be running concurrently if they were submitted by different threads. This is common if your application is serving requests over the network; for example, the Shark server works this way. Spark includes a fair scheduler to schedule resources within each SparkContext.

Spark有多种设施来调度计算所需的资源。首先,如同在 cluster mode overview中描述一样,不同的Spark应用使用独立的核,Spark运行的集群管理器提供应用间的调度方法。然后,在同一个Spark应用之间,被多个线程提交的多个任务(Spark actions)可以并行执行。如果你的应用是响应来自互联网的请求, 这个功能会非常有用,Spark Shark就是用这种方式运行的。Spark 包含了一个公平调度器来调度各个SparkContext之间的资源。

Scheduling Across Applications

When running on a cluster, each Spark application gets an independent set of executor JVMs that only run tasks and store data for that application. If multiple users need to share your cluster, there are different options to manage allocation, depending on the cluster manager.

当运行在集群上的时候,各个Spark应用得到独立的只运行该应用的任务存储该应用数据的的JVM集,如果多个用户需要共享集群,依赖于集群管理器有多种方式管理分配。

The simplest option, available on all cluster managers, is static partitioning of resources. With this approach, each application is given a maximum amount of resources it can use, and holds onto them for its whole duration. This is the approach used in Spark’s standalone andYARN modes, as well as the coarse-grained Mesos mode. Resource allocation can be configured as follows, based on the cluster type:

最简单的方式就是静态分配资源,所有的集群管理器都支持这个方法。使用这个方法,每个应用一开始就会被分配最大可能使用的资源,然后整个运行阶段都保持这些资源。Spark自带的管理器和YARN模式,以及粗粒度的Mesos模式采用这种方法。在各平台上,按一下方式配置:

  • Standalone mode: By default, applications submitted to the standalone mode cluster will run in FIFO (first-in-first-out) order, and each application will try to use all available nodes. You can limit the number of nodes an application uses by setting the spark.cores.maxconfiguration property in it, or change the default for applications that don’t set this setting through spark.deploy.defaultCores. Finally, in addition to controlling cores, each application’s spark.executor.memory setting controls its memory use.
  • 自带管理器模式:这是Spark调度的默认模式。所有被提交的应用会按FIFO的方式运行。每个应用会试图使用所有的节点, 你也可以通过设置spark.cores.max属性来限制一个应用最大可以使用的节点,也可以通过设置spark.deploy.defaultCores来对所有应用生效。你还可以设置spark.executor.memory来控制每个应用的内存使用量。
  • Mesos: To use static partitioning on Mesos, set the spark.mesos.coarse configuration property to true, and optionally set spark.cores.maxto limit each application’s resource share as in the standalone mode. You should also set spark.executor.memory to control the executor memory.
  • Mesos模式:如果你想使用Mesos的静态资源分配, 你需要设置spark.mesos.coarse 为true, 然后你可以设置spark.cores.max来限制每个应用能使用的节点。同时,内存用量也是通过spark.executor.coarse来设置的,跟自带的管理器模式基本一样。
  • YARN: The --num-executors option to the Spark YARN client controls how many executors it will allocate on the cluster, while --executor-memory and --executor-cores control the resources per executor.
  • YARN模式:使用--num-executors 来设置节点数,--executor-memory和--executor-cores来控制每个节点的内存和核。

A second option available on Mesos is dynamic sharing of CPU cores. In this mode, each Spark application still has a fixed and independent memory allocation (set by spark.executor.memory), but when the application is not running tasks on a machine, other applications may run tasks on those cores. This mode is useful when you expect large numbers of not overly active applications, such as shell sessions from separate users. However, it comes with a risk of less predictable latency, because it may take a while for an application to gain back cores on one node when it has work to do. To use this mode, simply use a mesos:// URL without setting spark.mesos.coarse to true.

Mesos支持动态共享CPU核。在这种模式下,各个Spark应用依然被分配固定的不可共享的内存(依然通过spark.executor.memory来分配),但一个节点上如果没有应用在执行任务,那么其他的应用可以使用这些核。这种模式适合你会有很多的应用,而这些应用并不是长时间处于激活,这种模式可以用来提供多个用户的shell会话。然而此模式的风险是不可预测的延迟,原因在于应用需要时间去取回核。要使用这个模式,使用mesos://URL 不要设置spark.mesos.coarse为true。

Note that none of the modes currently provide memory sharing across applications. If you would like to share data this way, we recommend running a single server application that can serve multiple requests by querying the same RDDs. For example, the Shark JDBC server works this way for SQL queries. In future releases, in-memory storage systems such as Tachyon will provide another approach to share RDDs.

注意,以上模式都不提供应用间共享内存。如果你想共享内存,我们推荐你运行一个可以响应多个请求的应用。比如说 Shark JDBC服务器以这种方式运行来响应SQL请求。Tachyon提供另一种方式来共享RDD。

Scheduling Within an Application

Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. savecollect) and any tasks that need to run to evaluate that action. Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users).

在同一个Spark应用中, 只要由多个线程提交,那么多个任务就可以并行执行。Spark的调度器是完全线程安全的,支持这种方法来响应多用户请求。

By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is divided into “stages” (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the head of the queue don’t need to use the whole cluster, later jobs can start to run right away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly.

默认情况下,Spark的调度器使用FIFO的方式运行job, 每个job被分为stages,最早的job享有得到所有资源的权利,然后是第二个job,......。如果队列头的job不需要所有的节点,那么第二个job可以立马运行。如果队列头的job非常大,后面的Job就会有严重的延迟。

Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.

从Spark0.8开始,spark支持job间的公平调度(fair scheduling)。这种模式下,每个任务得到粗略相同的集群资源。这样,短作业就能获得资源,降低了反应时间。这个模式适合多用户模式。

To enable the fair scheduler, simply set the spark.scheduler.mode property to FAIR when configuring a SparkContext:

要开启公平调度,你需要如下操作:

val conf = new SparkConf().setMaster(...).setAppName(...)
conf.set("spark.scheduler.mode", "FAIR")
val sc = new SparkContext(conf)

Fair Scheduler Pools

The fair scheduler also supports grouping jobs into pools, and setting different scheduling options (e.g. weight) for each pool. This can be useful to create a “high-priority” pool for more important jobs, for example, or to group the jobs of each user together and give users equal shares regardless of how many concurrent jobs they have instead of giving jobs equal shares. This approach is modeled after the Hadoop Fair Scheduler.

Spark也支持分级公平调度,将作业分配到不同的池,然后为每个池设置调度的优先级。用户可以用这个方法来创建高优先的池。比如给每个用户一个池从而给每个用户相同的资源而不是给每个作业相同的资源。这个方法来源于  Hadoop公平调度

Without any intervention, newly submitted jobs go into a default pool, but jobs’ pools can be set by adding the spark.scheduler.pool “local property” to the SparkContext in the thread that’s submitting them. This is done as follows:

默认情况下, 新提交的作业会被分配到默认池, 或者你可以设置SparkContext的spark.scheduler.pool来改变作业所在的池:

// Assuming sc is your SparkContext variable
sc.setLocalProperty("spark.scheduler.pool", "pool1")

After setting this local property, all jobs submitted within this thread (by calls in this thread to RDD.savecountcollect, etc) will use this pool name. The setting is per-thread to make it easy to have a thread run multiple jobs on behalf of the same user. If you’d like to clear the pool that a thread is associated with, simply call:

使用上面的操作后,该线程提交的所有作业(通过调用 RDD.save, count, collect等)都会使用同样的池。 如果你想清楚掉该线程使用的池,你可以:

sc.setLocalProperty("spark.scheduler.pool", null)

Default Behavior of Pools

By default, each pool gets an equal share of the cluster (also equal in share to each job in the default pool), but inside each pool, jobs run in FIFO order. For example, if you create one pool per user, this means that each user will get an equal share of the cluster, and that each user’s queries will run in order instead of later queries taking resources from that user’s earlier ones.

默认情况下,每个池获得相同的节点资源,但在每个池内,作业以FIFO的顺序获得节点资源。

Configuring Pool Properties

Specific pools’ properties can also be modified through a configuration file. Each pool supports three properties:

你可以通过修改配置文件来设置池的属性。每个池有3个属性:

  • schedulingMode: This can be FIFO or FAIR, to control whether jobs within the pool queue up behind each other (the default) or share the pool’s resources fairly.
  • schedulingMode: 这个属性的值可以是FIFO或者FIAR, 是用来控制池内的作业如何分享池内的资源。
  • weight: This controls the pool’s share of the cluster relative to other pools. By default, all pools have a weight of 1. If you give a specific pool a weight of 2, for example, it will get 2x more resources as other active pools. Setting a high weight such as 1000 also makes it possible to implement priority between pools—in essence, the weight-1000 pool will always get to launch tasks first whenever it has jobs active.
  • weight: 这个属性控制该池相对与其他池的权值。该属性默认值是1,如果你给一个池2的权值,那么他会得到默认池两倍的资源。设置高权值也会让池获得相比于其他池高的优先级。比如设置权值为1000,不管池内是否有活动作业, 该池始终第一个启动。
  • minShare: Apart from an overall weight, each pool can be given a minimum shares (as a number of CPU cores) that the administrator would like it to have. The fair scheduler always attempts to meet all active pools’ minimum shares before redistributing extra resources according to the weights. The minShare property can therefore be another way to ensure that a pool can always get up to a certain number of resources (e.g. 10 cores) quickly without giving it a high priority for the rest of the cluster. By default, each pool’s minShare is 0.
  • minShare: 除了权值,每个池还有一个最小所需的CPU核的值minimum shares。公平调度器总是尝试满足所有激活的池的最小CPU值,然后才根据权值重新分配额外的资源。minShare属性可以用来保证一个池能获得的最小的CPU,该属性默认为0。

The pool properties can be set by creating an XML file, similar to conf/fairscheduler.xml.template, and setting aspark.scheduler.allocation.file property in your SparkConf.

你可以通过创建一个XML文件来设置池属性, 类似与conf/fairsheduler.xml.template, 然后在你的SparkConf中设置你的spark.scheduler.allocation.file:

conf.set("spark.scheduler.allocation.file", "/path/to/file")

The format of the XML file is simply a <pool> element for each pool, with different elements within it for the various settings. For example:

这个文件的样子如下:

<?xml version="1.0"?>
<allocations>
  <pool name="production">
    <schedulingMode>FAIR</schedulingMode>
    <weight>1</weight>
    <minShare>2</minShare>
  </pool>
  <pool name="test">
    <schedulingMode>FIFO</schedulingMode>
    <weight>2</weight>
    <minShare>3</minShare>
  </pool>
</allocations>

A full example is also available in conf/fairscheduler.xml.template. Note that any pools not configured in the XML file will simply get default values for all settings (scheduling mode FIFO, weight 1, and minShare 0).

你可以参考 conf/fairscheduler.xml.template 来修改这个文件。注意,所有没被配置的项都会被设置为默认项(FIFO模式, 权值为1, 最小所需CPU为0)。



### 本文来自于spark任务调度

### 本文中带删除线比如 删除线 是我翻译不知道取舍的地方, 或者是我知识不够无法正确理解的地方。


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值