executor-memory / num-executors / executor-cores 方面
my submit script
- spark submit task on yarn
[xxx@xxx xxx]$ cat tmpsh.sh
/xxx/spark-2.0.2-bin-hadoop2.6/bin/spark-submit \
--class xxx.ChildrenLockUserSize \
--master yarn \
--deploy-mode cluster \
--driver-memory 19g \
--executor-memory 17g \
--num-executors 55 \
--executor-cores 4 \
--conf spark.driver.maxResultSize=10g \
/home/xxx-1.0-SNAPSHOT.jar
- spark submit on local
[xxx@xxx]$ cat tmplocalsh.sh
/xxx/spark-2.0.2-bin-hadoop2.6/bin/spark-submit \
--class xxx.perGeneBank.TvTagClassify \
--conf spark.driver.maxResultSize=25g \
/xxx/xx-1.0-SNAPSHOT.jar \
官网内容
http://spark.apache.org/docs/latest/submitting-applications.htmlBoth sbt and Maven have assembly plugins.
Some of the commonly used options are:
--class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)
--master: The master URL for the cluster
--deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client) †
--conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).
application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.
application-arguments: Arguments passed to the main method of your main class, if any
spark-submit 和 spark-submit –master local 效果是一样的
官网提交任务脚本:
Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \ # can be client for client mode
--executor-memory 20G \
--num-executors 50 \
/path/to/examples.jar \
1000
总核数 = 物理CPU个数 X 每颗物理CPU的核数
总逻辑CPU数 = 物理CPU个数 X 每颗物理CPU的核数 X 超线程数
查看物理CPU个数
cat /proc/cpuinfo| grep “physical id”| sort| uniq| wc -l
查看每个物理CPU中core的个数(即核数)
cat /proc/cpuinfo| grep “cpu cores”| uniq
查看逻辑CPU的个数
cat /proc/cpuinfo| grep “processor”| wc -l
验证
VCores:单节点逻辑cpu的核数
cat /proc/cpuinfo | grep "processor"| wc -l
32
单节点内存大小
free -g
total used free shared buff/cache available
Mem: 188 106 10 0 72 80
Swap: 3 1 2
VCores Total : 196 = 28 * 7
–executor-cores 4
默认为5(不超过5),设为4的情况下:每个执行器分配到四个虚拟核(逻辑CPU)
每个节点分配到的container的个数为 28/4 = 7 executor
每个executor的内存为:扣除节点系统等服务使用内存
175 / 7 = 25
spark.yarn.executor.memoryOverhead
max(executorMemory * 0.1, 384m)
25 / 1.1 = 22
这样7个节点分配到的 num-executors 就是 7 * 7 = 49
VCores Used : 49(task view) 有出入,待验证