用spark-submit启动Spark应用程序

最新推荐文章于 2023-04-26 08:46:07 发布

置顶泪痕残

最新推荐文章于 2023-04-26 08:46:07 发布

阅读量1k

点赞数

分类专栏： spark 文章标签： spark spark-submit spark程序提交 cluster

本文链接：https://blog.csdn.net/u012893747/article/details/76912309

版权

spark 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

    bin/spark-submit脚本负责建立包含Spark以及其依赖的类路径（classpath），它支持不同的集群管理器以及Spark支持的加载模式。

    /bin/spark-submit \
    --class <main-class>
    --master <master-url> \
    --deploy-mode <deploy-mode> \
    --conf <key>=<value> \
    ... # other options
    <application-jar> \
    [application-arguments]

    一些常用的选项是：
        --class ：你的应用程序的入口点(如org.apache.spark.examples.SparkPi)
        --master：集群的master URL(如spark://23.195.26.187:7077)
        --deploy-mode：在worker节点部署你的driver(cluster)或者本地作为外部客户端（client）。默认是client。
        --conf ：任意的Spark配置属性，格式是key=value。
        application-jar ：包含应用程序以及其依赖的jar包的路径。这个URL必须在集群中全局可见，例如，存在于所有节点的 hdfs:// 路径或 file:// 路径
        application-arguments ：传递给主类的主方法的参数


    spark-submit所有的可用选项：

        # Run application locally on 8 cores
        ./bin/spark-submit \
        --class org.apache.spark.examples.SparkPi \
        --master local[8] \
        /path/to/examples.jar \
        100

        # Run on a Spark Standalone cluster in client deploy mode
        ./bin/spark-submit \
        --class org.apache.spark.examples.SparkPi \
        --master spark://207.184.161.138:7077 \
        --executor-memory 20G \
        --total-executor-cores 100 \
        /path/to/examples.jar \
        1000

        # Run on a Spark Standalone cluster in cluster deploy mode with supervise
        ./bin/spark-submit \
        --class org.apache.spark.examples.SparkPi \
        --master spark://207.184.161.138:7077 \
        --deploy-mode cluster
        --supervise
        --executor-memory 20G \
        --total-executor-cores 100 \
        /path/to/examples.jar \
        1000

        # Run on a YARN cluster
        export HADOOP_CONF_DIR=XXX
        ./bin/spark-submit \
        --class org.apache.spark.examples.SparkPi \
        --master yarn-cluster \ # can also be `yarn-client` for client mode
        --executor-memory 20G \
        --num-executors 50 \
        /path/to/examples.jar \
        1000

        # Run a Python application on a Spark Standalone cluster
        ./bin/spark-submit \
        --master spark://207.184.161.138:7077 \
        examples/src/main/python/pi.py \
        1000