【问题解决】python任务提交到spark集群报错,提交本机没问题:requirement failed: Can only call getServletHandlers on a runni

在Pycharm里执行python程序提交到spark集群一直报错:requirement failed: Can only call getServletHandlers on a running MetricsSystem

本地提交到本机Spark没问题。

代码片段如下,用的spark官网的例子(点击打开链接):

logFile = "/home/ubutnu/spark_2_2_1/README.md"
sc = SparkContext("local","Simple App")
# sc = SparkContext(master="spark://192.168.3.207:7077",appName="Simple_App")
错误:
2018-04-17 17:11:09 ERROR StandaloneSchedulerBackend:70 - Application has been killed. Reason: All masters are unresponsive! Giving up.
2018-04-17 17:11:09 WARN  StandaloneSchedulerBackend:66 - Application ID is not initialized yet.
2018-04-17 17:11:09 WARN  StandaloneAppClient$ClientEndpoint:66 - Drop UnregisterApplication(null) because has not yet connected to master
2018-04-17 17:11:10 ERROR SparkContext:91 - Error initializing SparkContext.
java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem

at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:515)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)

……

查看集群master节点log,发现收到请求了但报错:

18/04/17 17:18:57 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
java.io.InvalidClassException: org.apache.spark.rpc.RpcEndpointRef; local class incompatible: stream classdesc serialVersionUID = -1329125091869941550, local class serialVersionUID = 18358321

37613908542
        at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1876)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1745)

        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1876)

……

定位问题:
1)将python文件放到master机器进行submit,单机或集群方式,都正常执行,说明程序没有问题,集群也没有问题:
spark-submit --master local  ~/test.py
spark-submit --master spark://192.168.3.207:7077  ~/test.py
2)将python文件放到本机,调用Spark目录下的/spark-submit提交任务,报同样的错误:
cd C:\spark-2.3.0-bin-hadoop2.7\bin
spark-submit --master spark://192.168.3.207:7077 ..\examples\src\main\python\pi.py

错误信息同上。

结合上面的master错误,是收到了请求但处理消息有问题。像是对象序列化后的id值不一致,感觉应该是两边jdk、或者spark版本不一致导致的。查看java -version,都是1.8,排除。spark版本,集群是2.2.1,我本机是2.3.0,比集群版本高。

解决办法:统一Spark版本。将本机也改为2.2.1版本,并修改环境变量,重启电脑。即可以正常提交任务。

不过在提交任务时,又遇到新问题:WARN TaskSchedulerImpl: Initial job has not accepted any resources。这个在另外一篇博文里解决->点击打开链接


阅读更多
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页