apache spark_第6部分:Apache Spark成本调整策略和常见问题解答摘要

apache spark

Expedia Group Technology —软件 (EXPEDIA GROUP TECHNOLOGY — SOFTWARE)

If you haven’t read the entire series of these Apache Spark cost tuning articles, then the changes recommended in this summary may not make sense. To understand these steps, I encourage you to read Part 1 which provides the philosophy behind the strategy, and Part 2 which shows you how to determine the estimated costs for your Spark job. The full series is linked below, after this summary of the steps you should take:

如果您尚未阅读这些Apache Spark成本调整文章的整个系列,那么此摘要中建议的更改可能没有意义。 为了理解这些步骤,我建议您阅读第1部分 ,该部分提供了该策略背后的原理,而第2部分则向您展示了如何确定Spark作业的估计成本。 在完成对步骤的总结之后,下面将链接整个系列:

  1. Switch executor core count to the ideal core count for your node as described in Part 3.

    第3部分中所述,将执行程序核心数转换为节点的理想核心数。

  2. If executor core count changed then adjust executor count using the method described in Part 4.

    如果执行者核心数发生了变化,则使用第4部分中描述的方法调整执行者数。

  3. Change executor memory to the efficient memory size for your node as described in Part 3.

    第3部分所述,将执行程序内存更改为节点的有效内存大小。

  4. When executor memory issues happen while running with the new config, add tweaks that resolve memory issues as described in Part 5.

    当使用新配置运行时执行程序内存问题出现时,请按第5部分中所述添加可解决内存问题的调整。

  5. If job is running with 100% CPU utilization and 100% memory utilization, consider running the job on a node with more memory per node CPU as described in Part 4 to improve run time.

    如果作业以100%的CPU利用率和100%的内存利用率运行,请考虑按第4部分中所述在每个节点CPU内存更多的节点上运行该作业。

  6. If the run time slows down after tuning and you want to sacrifice some cost savings for run time improvement, follow the method described in Part 4 to try improving run time.

    如果调整后运行时间变慢,并且您想牺牲一些节省的成本来改善运行时间,请按照第4部分中介绍的方法尝试改善运行时间。

Image for post

Q: What executor config do you recommend for a cluster with 32 cores and 256GB?

问:对于32核和256GB的集群,您建议使用哪种执行程序配置?

A: Because 31 is a prime number, I actually recommend leaving 2 cores for YARN and system processing. That leaves 30 cores for available processing which means a 5 core executor with 34GB of memory will work for this node as well.

答:因为31是质数,所以我实际上建议为YARN和系统处理留出2个内核。 剩下的30个内核可以进行处理,这意味着具有34GB内存的5个内核执行程序也可以在该节点上工作。

Q: What executor config do you recommend for clusters with nodes that have 8 or fewer cores?

问:对于具有8个或更少内核的节点的群集,您建议使用哪种执行程序配置?

A: I only recommend using 8 core (or less) nodes if your Spark jobs only run on a single node. If your jobs span two 8 core nodes (or four 4 core nodes) then your job would be better served running on a 16 core node. There are many reasons for this.

答:如果您的Spark作业仅在单个节点上运行,我仅建议使用8个核心(或更少)节点。 如果您的工作跨越两个8核心节点(或四个4核心节点),那么最好在16核心节点上运行您的工作。 这件事情是由很多原因导致的。

  1. The only config that utilizes all available CPUs on an 8 core node would be a 7 core executor. The consensus of the Spark community is that 5 core executors are the most performant. I’ve confirmed this with my own testing as well. So using 7 core executors will slowed down performance.

    利用8核心节点上所有可用CPU的唯一配置是7核心执行程序。 Spark社区的共识是5个核心执行程序是性能最高的。 我也通过自己的测试确认了这一点。 因此,使用7个核心执行程序会降低性能。
  2. Two 8 core nodes will only have 14 CPUs available for use as opposed to have 15 CPUs available on a single 16 core node.

    两个8核心节点只能使用14个CPU,而单个16核心节点上只有15个CPU。
  3. Two 8 core nodes will cost the same as one 16 core node (in the same instance family). So this means that when you use 16 core nodes you get better performance with your executors and one additional CPU to consume for the same price as two 8 core nodes.

    两个8核心节点的成本与一个16核心节点(在同一实例系列中)的成本相同。 因此,这意味着当您使用16个核心节点时,执行程序和一个额外的CPU以与两个8个核心节点相同的价格消耗,可以获得更好的性能。

I will be updating this FAQ in the future when needed.

我将在将来需要时更新此常见问题解答。

系列内容 (Series contents)

Learn more about technology at Expedia Group

在Expedia Group上了解有关技术的更多信息

翻译自: https://medium.com/expedia-group-tech/part-6-summary-of-apache-spark-cost-tuning-strategy-8d06148c5da6

apache spark

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值