spark_submit note

最新推荐文章于 2024-09-28 10:08:25 发布

oochen_98

最新推荐文章于 2024-09-28 10:08:25 发布

阅读量206

点赞数

分类专栏： others 文章标签： spark

本文链接：https://blog.csdn.net/oochen_98/article/details/78425634

版权

others 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

executor-memory / num-executors / executor-cores 方面

my submit script
- spark submit task on yarn

[xxx@xxx xxx]$ cat tmpsh.sh 
 /xxx/spark-2.0.2-bin-hadoop2.6/bin/spark-submit \
 --class xxx.ChildrenLockUserSize \
 --master yarn \
 --deploy-mode cluster \
 --driver-memory 19g \
 --executor-memory 17g \
 --num-executors 55 \
 --executor-cores 4 \
 --conf spark.driver.maxResultSize=10g \
/home/xxx-1.0-SNAPSHOT.jar

spark submit on local

[xxx@xxx]$ cat tmplocalsh.sh 
 /xxx/spark-2.0.2-bin-hadoop2.6/bin/spark-submit \
 --class xxx.perGeneBank.TvTagClassify \
 --conf spark.driver.maxResultSize=25g \
   /xxx/xx-1.0-SNAPSHOT.jar \

官网内容
http://spark.apache.org/docs/latest/submitting-applications.html

Both sbt and Maven have assembly plugins.
Some of the commonly used options are:

--class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)
--master: The master URL for the cluster
--deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client) †
--conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).
application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.
application-arguments: Arguments passed to the main method of your main class, if any

spark-submit 和 spark-submit –master local 效果是一样的

官网提交任务脚本：

Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \  # can be client for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

总核数 = 物理CPU个数 X 每颗物理CPU的核数
总逻辑CPU数 = 物理CPU个数 X 每颗物理CPU的核数 X 超线程数

查看物理CPU个数
cat /proc/cpuinfo| grep “physical id”| sort| uniq| wc -l

查看每个物理CPU中core的个数(即核数)
cat /proc/cpuinfo| grep “cpu cores”| uniq

查看逻辑CPU的个数
cat /proc/cpuinfo| grep “processor”| wc -l

验证

VCores：单节点逻辑cpu的核数

cat /proc/cpuinfo | grep "processor"| wc -l
32

　　　单节点内存大小

free -g
              total        used        free      shared  buff/cache   available
Mem:            188         106          10           0          72          80
Swap:             3           1           2