spark on yarn运行机制和资源分配

最新推荐文章于 2023-04-01 14:06:35 发布

weixin_30920091

最新推荐文章于 2023-04-01 14:06:35 发布

阅读量230

点赞数

文章标签：大数据

原文链接：http://www.cnblogs.com/double-kill/p/8547616.html

版权

http://dongxicheng.org/mapreduce-nextgen/hadoop-yarn-configurations-resourcemanager-nodemanager/

http://blog.csdn.net/xiangxizhishi/article/details/79102860

https://www.cnblogs.com/wonglu/p/8459694.html

http://blog.csdn.net/u010657789/article/details/52623107

http://blog.csdn.net/zhongyuan_1990/article/details/79064191

这里是转别人的博客，感谢作者的幸苦。

为的是说明spark on yarn的基本机制和参数

spark on yarn的那些事 ---第一篇

spark on yarn后一个spark application资源使用情况如何？

在不考虑动态分配spark资源的情况下： 一个spark application程序资源主要分为两部分：driver + executor，下面分别以client、cluster模式说明：

client模式：

spark driver启动在本地，而YARN Application Master启动在集群的某个节点中，所以要设置driver的资源必须要在启动时设定。AM仅用作资源管理。

driver资源：（因为是本地的JVM程序，并没有运行在容器中，不能做到cpu资源的隔离）

--driver-memory（也可以使用spark.driver.memory）

AM资源：

spark.yarn.am.cores

spark.yarn.am.memory

spark.yarn.am.memoryOverhead

executor资源：

spark.executor.cores

spark.executor.memory

spark.yarn.executor.memoryOverhead

spark.executor.instances

故而：

一个spark application所使用的资源为：

cores = spark.yarn.am.cores + spark.executor.cores * spark.executor.instances

memory = spark.yarn.am.memory + spark.yarn.am.memoryOverhead + (spark.executor.memory + spark.yarn.executor.memoryOverhead) * spark.executor.instances + --driver-memory

cluster模式：

spark driver和YARN Application Master运行在同一个JVM中，所以driver的资源参数也意味着控制着YARN AM的资源。

通过spark.yarn.submit.waitAppCompletion设置为false使spark client(运行在本地JVM中)提交完任务就退出，下面将不考虑其资源使用情况：

driver(AM)资源：

spark.driver.cores

spark.driver.memory

spark.yarn.driver.memoryOverhead

executor：

spark.executor.cores

spark.executor.memory

spark.yarn.executor.memoryOverhead

spark.executor.instances

故而：

一个spark application所使用的资源为：

cores = spark.driver.cores + spark.executor.cores * spark.executor.instances

memory = spark.driver.memory + spark.yarn.driver.memoryOverhead + (spark.executor.memory + spark.yarn.executor.memoryOverhead) * spark.executor.instances

总上所述：

client模式，AM和executor运行在yarn的container中；cluster模式，AM（和spark driver共享JVM）executor运行在yarn的container中，可以享用container的资源隔离机制。

这里是重点