Spark on yarn 环境运行

lucklilili

已于 2022-04-12 17:54:28 修改

阅读量1.5k

点赞数

分类专栏： Apache Spark 文章标签： spark

于 2022-04-12 17:53:02 首次发布

本文链接：https://blog.csdn.net/lucklilili/article/details/124128806

版权

Apache Spark 专栏收录该内容

31 篇文章 3 订阅

订阅专栏

Launching Spark on YARN

Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster. These configs are used to write to HDFS and connect to the YARN ResourceManager. The configuration contained in this directory will be distributed to the YARN cluster so that all containers used by the application use the same configuration. If the configuration references Java system properties or environment variables not managed by YARN, they should also be set in the Spark application’s configuration (driver, executors, and the AM when running in client mode).

There are two deploy modes that can be used to launch Spark applications on YARN. In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In clientmode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

Unlike other cluster managers supported by Spark in which the master’s address is specified in the --master parameter, in YARN mode the ResourceManager’s address is picked up from the Hadoop configuration. Thus, the --master parameter is yarn.

To launch a Spark application in cluster mode:

1、创建spark-env.sh

基于spark-env.sh.template 创建spark-env.sh文件

mv spark-env.sh.template spark-env.sh

2、修改spark-env.sh

新增JAVA_HOME、YARN_CONF_DIR

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home/
export YARN_CONF_DIR=/Library/hadoop-2.7.3/etc/hadoop/

3、执行Examples SparkPi

bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
--driver-memory 2g \
--driver-cores 1 \
--executor-memory 2g \
--executor-cores 1 \
examples/jars/spark-examples*.jar \
10

参数	解释	可选值举例
--class	Spark 程序中包含主函数的类	org.apache.spark.examples.SparkPi
--master	Spark 程序运行的模式(环境)	local[*]、spark://linux1:7077、Yarn
--deploy-mode	Spark程序运行模式	client、cluster
--driver-memory	客户端内存
--driver-cores	客户端核数
--executor-memory	执行端内存
--executor-cores	执行端核数
examples/jars/spark-examples*.jar	--class对应的jar路径	可以是HDFS、OSS