Spark 部署模式

Spark四大部署模式详解

最新推荐文章于 2025-11-21 16:31:17 发布

原创最新推荐文章于 2025-11-21 16:31:17 发布 · 883 阅读

19 ·

CC 4.0 BY-SA版权

文章标签：

#spark #大数据 #分布式

spark 专栏收录该内容

3 篇文章

订阅专栏

一、Local 模式

所谓 Local 模式，就是不需要其他任何节点资源就可以在本地执行 Spark 代码的环境，一般用于教学、调试、演示等。

1.1 提交应用命令

bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[2] \
./examples/jars/spark-examples_2.12-3.0.0.jar \
10

--class 表示要执行程序的主类；
--master local[2] 表示部署模式，默认为本地模式，数字表示分配的虚拟 CPU 核数；
默认 Web UI 端口为 4040

二、Standalone 模式

Spark 自身节点运行的集群模式，由 Spark 自身提供计算资源，无需其他框架提供资源，经典的 master-slave 模式。默认 Web UI 端口为 8080。

2.1 提交应用命令

bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://linux1:7077 \
./examples/jars/spark-examples_2.12-3.0.0.jar \
10

2.2 参数说明

参数	解释	可选值举例
--class	Spark 程序中包含主函数的类
--master	Spark 程序运行的模式(环境)	模式：local[*]、spark://linux1:7077、 Yarn
--total-executor-cores	指定所有 executor 使用的cpu 核数
--executor-cores	指定每个 executor 使用的cpu 核数
--executor-memory	指定每个 executor 可用内存

三、Yarn 模式

使用 Yarn 作为资源调度框架，执行 Spark 任务前需要先启动 HDFS 以及 YARN 集群。根据 Driver 运行的位置，又分为 Cluster 模式和 Client 模式。

3.1 Cluster 模式

3.1.1 特点

Driver 程序在 Yarn 集群的 ApplicationMaster 中运行；
客户端提交作业后可以断开连接
完整的资源管理和作业监控由 Yarn 负责
Driver 故障可由 Yarn 重启
网络要求较低

3.1.2 提交应用命令

bin/spark-submit \
  --class com.example.SparkApp \
  --master yarn \
  --deploy-mode cluster \
  --executor-memory 2g \
  --num-executors 4 \
  /path/to/your-app.jar

3.1.3 参数说明

参数	解释
--class	Spark 程序中包含主函数的类
--master	Spark 程序运行的模式(环境)
--deploy-mode	指定提交模式
--driver-cores	指定 Driver 可用的 cpu 核数
--driver-memory	指定 Driver 可用内存
--num-executors	指定 executor 个数
--executor-cores	指定每个 executor 可用的 cpu 核数
--executor-memory	指定每个 executor 可用内存

3.1.4 工作流程

客户端向 YARN ResourceManager 提交作业
ResourceManager 分配容器启动 ApplicationMaster
ApplicationMaster（包含Spark Driver）向 ResourceManager 申请 Executor 资源
Executor 在 NodeManager上启动并连接到 Driver
作业执行期间，客户端可以断开连接

3.2 Client 模式

3.2.1 特点

Driver 程序在 Yarn 集群的 ApplicationMaster 中运行；
客户端提交作业后可以断开连接
完整的资源管理和作业监控由 Yarn 负责
Driver 故障可由 Yarn 重启
网络要求较低

3.2.2 提交应用命令

bin/spark-submit \
--master yarn \
--deploy-mode client \
--driver-cores 2 \
--driver-memory 4g \
--num-executors 4 \
--executor-cores 3 \
--executor-memory 4g \
--conf spark.sql.shuffle.partitions=24 \
--conf spark.default.parallelism=24 \
--conf spark.ui.port=8095 \
--jars /data/job/spark/lib/test-withoutdep.jar \
--class com.example.SparkApp \
/data/job/spark/test.jar

3.2.3 参数说明

参数	解释
--class	Spark 程序中包含主函数的类
--master	Spark 程序运行的模式(环境)
--deploy-mode	指定提交模式
--driver-cores	指定 Driver 可用的 cpu 核数
--driver-memory	指定 Driver 可用内存
--num-executors	指定 executor 个数
--executor-cores	指定每个 executor 可用的 cpu 核数
--executor-memory	指定每个 executor 可用内存
spark.sql.shuffle.partitions	设置 Spark SQL shuffle 操作的分区数
spark.default.parallelism	设置 RDD 的默认并行度
spark.ui.port	设置 Spark Web UI 的端口号
--jars	指定额外的依赖 JAR 包路径