Spark提交命令和参数调优

最新推荐文章于 2024-06-07 16:50:51 发布

从一点一滴做起

最新推荐文章于 2024-06-07 16:50:51 发布

阅读量6.2k

点赞数 5

分类专栏： Linux Spark

本文链接：https://blog.csdn.net/qq_39313597/article/details/89947187

版权

Spark 同时被 2 个专栏收录

6 篇文章 0 订阅

订阅专栏

Linux

3 篇文章 0 订阅

订阅专栏

参数意义和参考值：

1.num-executors  线程数：一般设置在50-100之间，必须设置，不然默认启动的executor非常少，不能充分利用集群资源，运行速度慢
2.executor-memory 线程内存：参考值4g-8g,num-executor乘以executor-memory不能超过队列最大内存，申请的资源最好不要超过最大内存的1/3-1/2
3.executor-cores 线程CPU core数量：core越多，task线程就能快速的分配，参考值2-4，num-executor*executor-cores的1/3-1/2

1.spark-submit spark提交
2.--queue spark 在spark队列
3.--master yarn 在yarn节点提交
4.--deploy-mode client 选择client模型，还是cluster模式；在同一个节点用client,在不同的节点用cluster
5.--executor-memory=4G 线程内存：参考值4g-8g,num-executor乘以executor-memory不能超过队列最大内存，申请的资源最好不要超过最大内存的1/3-1/2
6.--conf spark.dynamicAllocation.enabled=true 是否启动动态资源分配
7.--executor-cores 2 线程CPU core数量：core越多，task线程就能快速的分配，参考值2-4，num-executor*executor-cores的1/3-1/2
8.--conf spark.dynamicAllocation.minExecutors=4 执行器最少数量
9.--conf spark.dynamicAllocation.maxExecutors=10 执行器最大数量
10.--conf spark.dynamicAllocation.initialExecutors=4 若动态分配为true,执行器的初始数量
11.--conf spark.executor.memoryOverhead=2g 堆外内存：处理大数据的时候，这里都会出现问题，导致spark作业反复崩溃，无法运行；此时就去调节这个参数，到至少1G（1024M），甚至说2G、4G）
12.--conf spark.speculation=true 推测执行：在接入kafaka的时候不能使用，需要考虑情景
13.--conf spark.shuffle.service.enabled=true 提升shuffle计算性能

动态改变参数：

spark-submit --queue spark --master yarn --deploy-mode client --executor-memory=4G --conf spark.dynamicAllocation.enabled=true --executor-cores 2 --conf spark.dynamicAllocation.minExecutors=4 --conf spark.dynamicAllocation.maxExecutors=10 --conf spark.dynamicAllocation.initialExecutors=4 --conf spark.executor.memoryOverhead=2g --conf spark.speculation=true --conf spark.shuffle.service.enabled=true --class com.practice  Spark.jar

静态参数：

spark-submit --master yarn --deploy-mode client --executor-memory 10G --num-executors 20 --executor-cores 2 --driver-memory 8g --conf spark.driver.maxResultSize=6G --conf spark.network.timeout=300 --conf spark.executor.heartbeatInterval=30 --conf spark.task.maxFailures=4 --queue spark --conf spark.speculation=true --conf spark.shuffle.service.enabled=true --conf spark.executor.memoryOverhead=8g  --class com.practice Spark.jar
spark-submit --master yarn --deploy-mode client --executor-memory 14G --num-executors 20 --executor-cores 4 --driver-memory 8g --conf spark.driver.maxResultSize=8G --conf spark.network.timeout=300 --conf spark.executor.heartbeatInterval=30 --conf spark.task.maxFailures=4 --queue spark --conf spark.speculation=true --conf spark.shuffle.service.enabled=true --conf spark.executor.memoryOverhead=10g  --class com.practice Spark.jar

说明：

cluster 提交可以关闭命令窗口，后台运行。运于生产环境

client 提交不可以关闭命令窗口。用于调试

从一点一滴做起

关注

5
点赞
踩
20

收藏

觉得还不错? 一键收藏
0
评论
Spark提交命令和参数调优

参数意义和参考值：1.num-executors 线程数：一般设置在50-100之间，必须设置，不然默认启动的executor非常少，不能充分利用集群资源，运行速度慢2.executor-memory 线程内存：参考值4g-8g,num-executor乘以executor-memory不能超过队列最大内存，申请的资源最好不要超过最大内存的1/3-1/23.executor-cores...
复制链接

扫一扫