最近看到一篇不错的关于Spark内存调优的blog,分享一下:
https://idk.dev/best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr/
这篇blog主要提出了几种Spark内存调优方式(基于的是Amazon EMR总结的,但是我觉得通用性还是很强),的确是平时会遇到的情况,在这里就不做通篇的翻译了,我在这里结合自己遇到的情况大概总结一些,详细的大家可以自己去看原文。
Spark调优基本配置参数
以下几个参数是最基本的job调优参数,只有把这几个参数设置的比较合适之后,我们才有更进一步的优化。
property name | Default | Meaning |
---|---|---|
spark.executor.memory | 1g | Amount of memory to use per executor process, in the same format as JVM memory strings with a size unit suffix (“k”, “m”, “g” or “t”) (e.g. 512m, 2g). |
spark.driver.memory | 1g | Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in the same format as JVM memory strings with a size unit suffix (“k”, “m”, “g” or “t”) (e.g. 512m, 2g). Note: In |