Spark spark-submit 参数

最新推荐文章于 2024-07-10 08:00:00 发布

生命不息丶折腾不止

最新推荐文章于 2024-07-10 08:00:00 发布

阅读量2.8k

点赞数

分类专栏： spark 文章标签： spark 参数

spark 专栏收录该内容

58 篇文章 10 订阅

订阅专栏

参数翻译

参数名 __________________	格式	参数说明
–master	MASTER_URL	spark://host:port, mesos://host:port, yarn, or local.
–deploy-mode	DEPLOY_MODE	是否在本地启动驱动程序(“client”) 或者在集群内部的一个工作机器上(“cluster”)(默认: client)。
–class	CLASS_NAME	应用程序的主类(用于Java / Scala应用程序)。
–name	NAME	应用程序的名称。
–jars	JARS	逗号分隔的本地jar包，包含在driver和executor的classpath下。
–packages		包含在driver和executor的classpath下的jar包逗号分隔的”groupId:artifactId：version”列表
–exclude-packages		用逗号分隔的”groupId:artifactId”列表, 在解析包中提供的依赖项时排除，以避免依赖性冲突。
–repositories		逗号分隔的远程仓库
–py-files	PY_FILES	逗号分隔的”.zip”,”.egg”或者“.py”文件，这些文件放在python app的PYTHONPATH下面
–files	FILES	逗号分隔的文件，这些文件放在每个executor的工作目录下面
–conf	PROP=VALUE	任意的spark配置属性。
–properties-file	FILE	用于加载额外属性的文件的路径。如果没有指定，将查找conf/spark-defaults.conf。
–driver-memory	MEM	driver的内存大小 (例如 1000M, 2G) (默认: 1024M).
–driver-java-options		传给driver的额外的Java选项
–driver-library-path		传给driver的额外的库路径
–driver-class-path		传给driver的额外的类路径，注意，通过–jars 添加的自动加载到classPath路径下。
–executor-memory	MEM	每个executor的内存大小 (例如：1000M, 2G) (默认: 1G).
–proxy-user	NAME	提交应用程序时的模拟用户。
–help, -h		显示此帮助消息并退出
–verbose, -v		打印更多的调试输出
–version,		打印当前Spark的版本

仅在使用standalone的cluster模式：


–driver-cores	NUM	driver的核数 (默认: 1).

仅在standalone和Mesos的cluster模式:


–supervise	如果给定，在driver启动失败时候重试。
–kill	SUBMISSION_ID	如果给定，杀死指定的driver
–status	SUBMISSION_ID	如果给定，返回指定的driver的状态。

仅在standalone和Mesos模式：


–total-executor-cores	NUM	所有executors的总核数。

仅在standalone和YARN模式：


–executor-cores	NUM	每个executor的核数。(默认: 1 in YARN mode；worker所有可用的cores数 in standalone mode)

仅在YARN模式下:


–driver-cores	NUM	driver用的内核数, 仅用在cluster模式(默认: 1)。
–queue	QUEUE_NAME	要提交到的YARN队列 (默认: “default”)。
–num-executors	NUM	启动的executors的数量 (默认: 2)。
–archives	ARCHIVES	用逗号分隔的档案，被添加到每个executor的工作目录。

官网案例

# 1、locally on 8 cores

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master local[8] \
  /path/to/examples.jar \
  100

---------------------------------------------------------

# 2、standalone in client 

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

---------------------------------------------------------

# 3、standalone in cluster with supervise[如果给定，在driver启动失败时候重试。]

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

---------------------------------------------------------

# 4、YARN client

export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode client \  # can be client for cluster mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

---------------------------------------------------------

# 5、YARN cluster
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    --queue thequeue \
    lib/spark-examples*.jar \
    10
---------------------------------------------------------

# 6、Python application on  standalone cluster
./bin/spark-submit \
  --master spark://207.184.161.138:7077 \
  examples/src/main/python/pi.py \
  1000