- 配置环境变量【/etc/profile】
#SPARK_HOME
export SPARK_HOME=/home/bduser/opt/module/spark-2.1.3
export PATH=$PATH:$SPARK_HOME/bin
- spark onyarn版本的搭建需要修改【spark-env.sh】文件
export SPARK_MASTER_IP=my121
export JAVA_HOME=/opt/module/jdk1.8.0_172
export HADOOP_HOME=/opt/module/hadoop-2.7.6
export SCALA_HOME=/opt/module/scala-2.11.8
export HADOOP_CONF_DIR=/opt/module/hadoop-2.7.6/etc/hadoop
- 修改slaves配置文件中的内容,将所有worker节点的地址(IP或者HostName)添加进去
my121
my122
my123
- 如果所有节点进程都成功启动,则可以通过执行SparkPi示例程序对当前on
yarn版本集群环境进行测试。执行程序SparkPi的命令如下,该条命令可以在任意节点上执行
./bin/spark-submit --master yarn --deploy-mode client --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.11-2.1.3.jar
出现Pi is roughly 3.144155720778604即成功【每次执行的结果都是估算值】
5. 进入UI界面
有可能会出现如下报错:
ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
hdfs://my121:9000/user/bduser/.sparkStaging/application_1543306777138_0001
18/11/27 16:22:05 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2323)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:876)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:868)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:744)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
解决方法:
找到hadoop配置文件yarn-site.xml
添加下面内容
<!-- 是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true。 -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
设置的时候不知道碰到什么了,vi编辑老是出现黄色标识
解决:
:noh