Spark基于YARN的调度模式
- 由于YARN模式下不需要Mesos模式下的所有Master进程和所有Worker进程,需首先关闭这些进程
./stop-slaves.sh
./stop-slaves.sh
- 启动YARN集群
start-yarn.sh
- 创建测试文件
vi wordcount.txt
hdfs dfs -put wordcount.txt wordcount.txt
- 运行spark-shell
在yarn模式下运行spark-shell需要指定–master yarn参数
./spark-shell --master yarn
报错:
ERROR cluster.YarnClientSchedulerBackend: The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details.
需要在yarn-site.xml中添加如下配置:
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
然后分发给各个节点
重新启动yarn,然后重试
Exception in thread "main" java.lang.NoSuchMethodError: jline.console.completer.CandidateListCompletionHandler.setPrintSpaceAfterFullCompletion(Z)V
at scala.tools.nsc.interpreter.jline.JLineConsoleReader.initCompletion(JLineReader.scala:139)
at scala.tools.nsc.interpreter.jline.InteractiveReader.postInit(JLineReader.scala:54)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1$$anonfun$1.apply(SparkILoop.scala:190)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1$$anonfun$1.apply(SparkILoop.scala:188)
at scala.tools.nsc.interpreter.SplashReader.postInit(InteractiveReader.scala:130)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1$$anonfun$org$apache$spark$r
可能是由于spark版本造成的,需要下载安装和hadoop相对应的版本,将spark版本更改为2.3.3版本即可解决该问题
sc.textFile("/wordcount.txt").flatMap(_.split(" ")).map(word=>(word,1)).reduceByKey(_+_).map(entry=>(entry._2,entry._1)).sortByKey(false,1).map(entry=>(entry._2, entry._1)).saveAsTextFile("/spark/output")
HDFS下查看输出结果: