数据本地化级别

最新推荐文章于 2020-09-21 16:08:10 发布

马蹄急66

最新推荐文章于 2020-09-21 16:08:10 发布

阅读量1.6k

点赞数 1

分类专栏： spark

本文链接：https://blog.csdn.net/matiji66/article/details/51622067

版权

spark 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Property Name	Default	Meaning

`spark.locality.wait`	3s	How long to wait to launch a data-local task before giving up and launching it on a less-local node. The same wait will be used to step through multiple locality levels (process-local, node-local, rack-local and then any). It is also possible to customize the waiting time for each level by setting `spark.locality.wait.node`, etc. You should increase this setting if your tasks are long and see poor locality, but the default usually works well.
`spark.locality.wait.node`	spark.locality.wait	Customize the locality wait for node locality. For example, you can set this to 0 to skip node locality and search immediately for rack locality (if your cluster has rack information).
`spark.locality.wait.process`	spark.locality.wait	Customize the locality wait for process locality. This affects tasks that attempt to access cached data in a particular executor process.
`spark.locality.wait.rack`	spark.locality.wait	Customize the locality wait for rack locality.
`spark.scheduler.maxRegisteredResourcesWaitingTime`	30s	Maximum amount of time to wait for resources to register before scheduling begins.
`spark.scheduler.minRegisteredResourcesRatio`	0.8 for YARN mode; 0.0 for standalone mode and Mesos coarse-grained mode	The minimum ratio of registered resources (registered resources / total expected resources) (resources are executors in yarn mode, CPU cores in standalone mode and Mesos coarsed-grained mode ['spark.cores.max' value is total expected resources for Mesos coarse-grained mode] ) to wait for before scheduling begins. Specified as a double between 0.0 and 1.0. Regardless of whether the minimum ratio of resources has been reached, the maximum amount of time it will wait before scheduling begins is controlled by config`spark.scheduler.maxRegisteredResourcesWaitingTime`.
`spark.scheduler.mode`	FIFO	The scheduling mode between jobs submitted to the same SparkContext. Can be set to `FAIR` to use fair sharing instead of queueing jobs one after another. Useful for multi-user services.
`spark.scheduler.revive.interval`	1s	The interval length for the scheduler to revive the worker resource offers to run tasks.
`spark.speculation`	false	If set to "true", performs speculative execution of tasks. This means if one or more tasks are running slowly in a stage, they will be re-launched.
`spark.speculation.interval`	100ms	How often Spark will check for tasks to speculate.
`spark.speculation.multiplier`	1.5	How many times slower a task is than the median to be considered for speculation.
`spark.speculation.quantile`	0.75	Percentage of tasks which must be complete before speculation is enabled for a particular stage.
`spark.task.cpus`	1	Number of cores to allocate for each task.
`spark.task.maxFailures`	4	Number of individual task failures before giving up on the job. Should be greater than or equal to 1. Number of allowed retries = this value - 1.

使用中，打印出来的本地化级别

16:49:56 INFO TaskSetManager: Starting task 0.0 in stage 12.0 (TID 8, localhost, PROCESS_LOCAL, 841218 bytes)

16:49:56 INFO Executor: Running task 0.0 in stage 12.0 (TID 8)

#都有哪些本地化级别

PROCESS_LOCAL：进程本地化，代码和数据在同一个进程中，也就是在同一个executor中；计算数据的task由executor执行，数据在executor的BlockManager中；性能最好
NODE_LOCAL：节点本地化，代码和数据在同一个节点中；比如说，数据作为一个HDFS block块，就在节点上，而task在节点上某个executor中运行；或者是，数据和task在一个节点上的不同executor中；数据需要在进程间进行传输
NO_PREF：对于task来说，数据从哪里获取都一样，没有好坏之分
RACK_LOCAL：机架本地化，数据和task在一个机架的两个节点上；数据需要通过网络在节点之间进行传输
ANY：数据和task可能在集群中的任何地方，而且不在一个机架中，性能最差

#本地化的等待时间

spark.locality.wait，默认是3s；6s，10s

设置方式：
默认情况下，下面3个的等待时长，都是跟上面那个是一样的，都是3s
spark.locality.wait.process
spark.locality.wait.node
spark.locality.wait.rack

new SparkConf()
.set("spark.locality.wait", "10")

马蹄急66

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
数据本地化级别

Property NameDefaultMeaning spark.locality.wait3sHow long to wait to launch a data-local task before giving up and launching it on a less-local node. The same
复制链接

扫一扫

专栏目录