黑猴子的家：Spark 应用提交（spark-submit）

最新推荐文章于 2023-10-20 14:06:27 发布

黑猴子的家

最新推荐文章于 2023-10-20 14:06:27 发布

阅读量222

点赞数

本文链接：https://blog.csdn.net/qq_28652401/article/details/101468324

版权

1、bin/spark-submit

一旦打包好,就可以使用bin/spark-submit脚本启动应用了. 这个脚本负责设置spark使用的classpath和依赖,支持不同类型的集群管理器和发布模式

[victor@node1 spark]$ /bin/spark-submit \
 --class <main-class>
 --master <master-url> \
 --deploy-mode <deploy-mode> \
 --conf <key>=<value> \
 ... # other options
 <application-jar> \
[application-arguments]

一些常用选项:
--class: 你的应用的启动类 (如 org.apache.spark.examples.SparkPi)
--master: 集群的master URL (如 spark://node1:7077)
--deploy-mode: 是否发布你的驱动到worker节点(cluster) 或者作为一个本地客户端 (client) (default: client)*
--conf: 任意的Spark配置属性，格式key=value. 如果值包含空格，可以加引号“key=value”. 缺省的Spark配置
application-jar: 打包好的应用jar,包含依赖. 这个URL在集群中全局可见。比如hdfs:// 共享存储系统,如果是 file:// path，那么所有的节点的path都包含同样的jar
application-arguments: 传给main()方法的参数
Master URL 格式

local	本地以一个worker线程运行(例如非并行的情况)
local[K]	本地以K worker 线程 (理想情况下, K设置为你机器的CPU核数)
local[*]	本地以本机同样核数的线程运行
spark://HOST:PORT	连接到指定的Spark standalone cluster master. 端口是你的master集群配置的端口，缺省值为7077
mesos://HOST:PORT	连接到指定的Mesos 集群. Port是你配置的mesos端口，缺省是5050. 或者如果Mesos使用ZOoKeeper,格式为 mesos://zk://....
yarn-client	以client模式连接到YARN cluster. 集群的位置基于HADOOP_CONF_DIR 变量找到
yarn-cluster	以cluster模式连接到YARN cluster. 集群的位置基于HADOOP_CONF_DIR 变量找到

查看Spark-submit全部参数

[victor@node1 spark]$ bin/spark-submit
Usage: spark-submit [options] <app jar | python file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
Usage: spark-submit run-example [options] example-class [example args]
Options:
 --master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.
 --deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
 on one of the worker machines inside the cluster ("cluster")
 (Default: client).
 --class CLASS_NAME Your application's main class (for Java / Scala apps).
 --name NAME                A name of your application.
 --jars JARS Comma-separated list of local jars to include on the driver
 and executor classpaths.
 --packages Comma-separated list of maven coordinates of jars to include
 on the driver and executor classpaths. Will search the local
 maven repo, then maven central and any additional remote
 repositories given by --repositories. The format for the
 coordinates should be groupId:artifactId:version.
 --exclude-packages Comma-separated list of groupId:artifactId, to exclude while
 resolving the dependencies provided in --packages to avoid
 dependency conflicts.
 --repositories Comma-separated list of additional remote repositories to
 search for the maven coordinates given with --packages.
 --py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place
 on the PYTHONPATH for Python apps.
 --files FILES Comma-separated list of files to be placed in the working
 directory of each executor. File paths of these files
 in executors can be accessed via SparkFiles.get(fileName).
 --conf PROP=VALUE Arbitrary Spark configuration property.
 --properties-file FILE Path to a file from which to load extra properties. If not
 specified, this will look for conf/spark-defaults.conf.
 --driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 1024M).
 --driver-java-options Extra Java options to pass to the driver.
 --driver-library-path Extra library path entries to pass to the driver.
 --driver-class-path Extra class path entries to pass to the driver. Note that
 jars added with --jars are automatically included in the
 classpath.
 --executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).
 --proxy-user NAME User to impersonate when submitting the application.
 This argument does not work with --principal / --keytab.
 --help, -h Show this help message and exit.
 --verbose, -v Print additional debug output.
 --version, Print the version of current Spark.
 Spark standalone with cluster deploy mode only:
 --driver-cores NUM Cores for driver (Default: 1).
 Spark standalone or Mesos with cluster deploy mode only:
 --supervise If given, restarts the driver on failure.
 --kill SUBMISSION_ID If given, kills the driver specified.
 --status SUBMISSION_ID If given, requests the status of the driver specified.
 Spark standalone and Mesos only:
 --total-executor-cores NUM Total cores for all executors.
 Spark standalone and YARN only:
 --executor-cores NUM Number of cores per executor. (Default: 1 in YARN mode,
 or all available cores on the worker in standalone mode)
 YARN-only:
 --driver-cores NUM Number of cores used by the driver, only in cluster mode
 (Default: 1).
 --queue QUEUE_NAME The YARN queue to submit to (Default: "default").
 --num-executors NUM Number of executors to launch (Default: 2).
  If dynamic allocation is enabled, the initial number of
 executors will be at least NUM.
 --archives ARCHIVES Comma separated list of archives to be extracted into the
 working directory of each executor.
 --principal PRINCIPAL Principal to be used to login to KDC, while running on
 secure HDFS.
 --keytab KEYTAB The full path to the file that contains the keytab for the
 principal specified above. This keytab will be copied to
 the node running the Application Master via the Secure
 Distributed Cache, for renewing the login tickets and the
 delegation tokens periodically.

2、执行第一个在Yarn上的spark程序

[victor@node1 spark]$ bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
examples/jars/spark-examples_2.11-2.1.1.jar \
100

3、执行第一个在standalone上的spark程序

[victor@node1 spark]$ bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://hadoop102:7077 \
--executor-memory 1G \
--total-executor-cores 2 \
examples/jars/spark-examples_2.11-2.1.1.jar \
100

黑猴子的家

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
黑猴子的家：Spark 应用提交（spark-submit）

1、bin/spark-submit一旦打包好,就可以使用bin/spark-submit脚本启动应用了. 这个脚本负责设置spark使用的classpath和依赖,支持不同类型的集群管理器和发布模式[victor@node1 spark]$ /bin/spark-submit \ --class <main-class> --master <master...
复制链接

扫一扫