hadoop环境搭建请参考hadoop3.2.2集群搭建
环境
centos7、jdk1.8.0_311、scala-2.12.15、zookeeper-3.6.3、hadoop3.2.2、spark-3.2.1-bin-hadoop3.2
spark配置
- 配置
${SPARK_HOME}/conf/spark-defaults.conf
,添加如下内容:
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.eventLog.enabled true
spark.eventLog.dir hdfs://vmcluster/spark-history
spark.eventLog.compress true
spark.yarn.historyServer.address node-3:18080
spark.history.ui.port 18080
spark.history.fs.logDirectory hdfs://vmcluster/spark-history
spark.history.retainedApplications 10
spark.history.fs.update.interval 5s
注意:将spark-defaults.conf.template
文件名修改为spark-defaults.conf
。
- 配置
${SPARK_HOME}/conf/spark-env.sh
,添加如下内容:
export JAVA_HOME=/home/bigdata/env/jdk1.8.0_311
export SCALA_HOME=/home/bigdata/env/scala-2.12.15
export SPARK_HOME=/home/bigdata/env/spark-3.2.1-bin-hadoop3.2
export SPARK_CONF=${SPARK_HOME}/conf
export HADOOP_HOME=/home/bigdata/env/hadoop-3.2.2
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
注意:将spark-env.sh.template
文件名修改为spark-env.sh
。
启动historyserver
start-history-server.sh
测试
提交spark自带的SparkPi
进行测试,提交命令如下:
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--num-executors 1 \
--executor-memory 512m \
--executor-cores 1 \
--queue bigdata \
${SPARK_HOME}/examples/jars/spark-examples*.jar \
100
注意:配置spark的SPARK_HOME
系统环境变量。
由于是cluster
模式提交任务,结果不会输出到控制台。控制台日志输出如下:
2022-03-16 10:43:41,387 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2022-03-16 10:43:41,784 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers
2022-03-16 10:43:42,334 INFO conf.Configuration: resource-types.xml not found
2022-03-16 10:43:42,335 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-03-16 10:43:42,357 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
2022-03-16 10:43:42,358 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
2022-03-16 10:43:42,358 INFO yarn.Client: Setting up container launch context for our AM
2022-03-16 10:43:42,359 INFO yarn.Client: Setting up the launch environment for our AM container
2022-03-16 10:43:42,367 INFO yarn.Client: Preparing resources for our AM container
2022-03-16 10:43:42,487 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2022-03-16 10:43:43,802 INFO yarn.Client: Uploading resource file:/tmp/spark-d6ff4da4-4283-43fb-a517-9085d51a1e82/__spark_libs__7226558732161014901.zip -> hdfs://lvcluster/user/bigdata/.sparkStaging/application_1647396476966_0002/__spark_libs__7226558732161014901.zip
2022-03-16 10:43:56,526 INFO yarn.Client: Uploading resource file:/home/bigdata/env/spark-3.2.1-bin-hadoop3.2/examples/jars/spark-examples_2.12-3.2.1.jar -> hdfs://lvcluster/user/bigdata/.sparkStaging/application_1647396476966_0002/spark-examples_2.12-3.2.1.jar
2022-03-16 10:43:57,009 INFO yarn.Client: Uploading resource file:/tmp/spark-d6ff4da4-4283-43fb-a517-9085d51a1e82/__spark_conf__3589752284083344005.zip -> hdfs://lvcluster/user/bigdata/.sparkStaging/application_1647396476966_0002/__spark_conf__.zip
2022-03-16 10:43:57,203 INFO spark.SecurityManager: Changing view acls to: bigdata
2022-03-16 10:43:57,203 INFO spark.SecurityManager: Changing modify acls to: bigdata
2022-03-16 10:43:57,203 INFO spark.SecurityManager: Changing view acls groups to:
2022-03-16 10:43:57,204 INFO spark.SecurityManager: Changing modify acls groups to:
2022-03-16 10:43:57,204 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(bigdata); groups with view permissions: Set(); users with modify permissions: Set(bigdata); groups with modify permissions: Set()
2022-03-16 10:43:57,254 INFO yarn.Client: Submitting application application_1647396476966_0002 to ResourceManager
2022-03-16 10:43:57,515 INFO impl.YarnClientImpl: Submitted application application_1647396476966_0002
2022-03-16 10:43:58,520 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:43:58,522 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.bigdata
start time: 1647398637277
final status: UNDEFINED
tracking URL: http://server1:8088/proxy/application_1647396476966_0002/
user: bigdata
2022-03-16 10:43:59,527 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:00,537 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:01,548 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:02,555 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:03,557 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:04,562 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:05,564 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:06,574 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:07,588 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:08,595 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:09,605 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:09,605 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: server1
ApplicationMaster RPC port: 44451
queue: root.bigdata
start time: 1647398637277
final status: UNDEFINED
tracking URL: http://server1:8088/proxy/application_1647396476966_0002/
user: bigdata
2022-03-16 10:44:10,617 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:11,630 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:12,643 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:13,653 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:14,658 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:15,667 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:16,709 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:17,722 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:18,727 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:19,730 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:20,737 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:21,749 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:22,752 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:23,760 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:24,782 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:25,791 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:26,793 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:27,803 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:28,809 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:29,822 INFO yarn.Client: Application report for application_1647396476966_0002 (state: FINISHED)
2022-03-16 10:44:29,823 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: server1
ApplicationMaster RPC port: 44451
queue: root.bigdata
start time: 1647398637277
final status: SUCCEEDED
tracking URL: http://server1:8088/proxy/application_1647396476966_0002/
user: bigdata
2022-03-16 10:44:29,843 INFO util.ShutdownHookManager: Shutdown hook called
2022-03-16 10:44:29,844 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-d6ff4da4-4283-43fb-a517-9085d51a1e82
2022-03-16 10:44:29,848 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-35dc976c-c371-4888-acc8-25e3a44d60a5
yarn web ui
yarn web ui 跳转到 spark web ui
还是比较简单,就不过多赘述。