spark_submit note

executor-memory / num-executors / executor-cores 方面

my submit script
- spark submit task on yarn

[xxx@xxx xxx]$ cat tmpsh.sh 
 /xxx/spark-2.0.2-bin-hadoop2.6/bin/spark-submit \
 --class xxx.ChildrenLockUserSize \
 --master yarn \
 --deploy-mode cluster \
 --driver-memory 19g \
 --executor-memory 17g \
 --num-executors 55 \
 --executor-cores 4 \
 --conf spark.driver.maxResultSize=10g \
/home/xxx-1.0-SNAPSHOT.jar 
  • spark submit on local
[xxx@xxx]$ cat tmplocalsh.sh 
 /xxx/spark-2.0.2-bin-hadoop2.6/bin/spark-submit \
 --class xxx.perGeneBank.TvTagClassify \
 --conf spark.driver.maxResultSize=25g \
   /xxx/xx-1.0-SNAPSHOT.jar \
--class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)
--master: The master URL for the cluster
--deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client) †
--conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).
application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.
application-arguments: Arguments passed to the main method of your main class, if any

spark-submit 和 spark-submit –master local 效果是一样的

官网提交任务脚本:

Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \  # can be client for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

总核数 = 物理CPU个数 X 每颗物理CPU的核数
总逻辑CPU数 = 物理CPU个数 X 每颗物理CPU的核数 X 超线程数

查看物理CPU个数
cat /proc/cpuinfo| grep “physical id”| sort| uniq| wc -l

查看每个物理CPU中core的个数(即核数)
cat /proc/cpuinfo| grep “cpu cores”| uniq

查看逻辑CPU的个数
cat /proc/cpuinfo| grep “processor”| wc -l

  • 验证

    VCores:单节点逻辑cpu的核数

cat /proc/cpuinfo | grep "processor"| wc -l
32

   单节点内存大小

free -g
              total        used        free      shared  buff/cache   available
Mem:            188         106          10           0          72          80
Swap:             3           1           2

VCores Total : 196 = 28 * 7

–executor-cores 4
默认为5(不超过5),设为4的情况下:每个执行器分配到四个虚拟核(逻辑CPU)

每个节点分配到的container的个数为 28/4 = 7 executor

每个executor的内存为:扣除节点系统等服务使用内存
175 / 7 = 25

spark.yarn.executor.memoryOverhead
max(executorMemory * 0.1, 384m)

25 / 1.1 = 22

这样7个节点分配到的 num-executors 就是 7 * 7 = 49

VCores Used : 49(task view) 有出入,待验证

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值