Spark 集群故障快速排除方法----worker已经启动,但是masterUI上看不到
1.确保zookeeper 状态正常
echo stat | nc 10.20.2.51 2181
echo stat | nc 10.20.2.52 2181
echo stat | nc 10.20.2.53 2181
echo stat | nc 10.20.2.54 2181
echo stat | nc 10.20.2.55 2181
happy:scala-2.12 happy$ echo ‘stat’ | nc 10.20.2.53 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
/10.20.2.32:415721
/0:0:0:0:0:0:0:1:509061
/10.20.2.35:392021
/10.20.2.63:466421
/10.20.2.3:594941
/192.168.2.33:591460
/10.20.2.12:500321
Latency min/avg/max: 0/0/27
Received: 3549183
Sent: 3549249
Connections: 7
Outstanding: 0
Zxid: 0x300000c00
Mode: leader
Node count: 294
Proposal sizes last/min/max: 3143/32/5548
-
停止spark 集群 $SPARK_HOME/sbin/stop-all.sh
-
清理之前的日志.
cat /var/server/spark/conf/slaves | grep ‘^spark-node’ | xargs -i -t ssh root@{} “rm -rf /var/server/spark/logs/.”
cat /var/server/spark/conf/slaves | grep ‘^spark-node’ | xargs -i -t ssh root@{} “chown -R spark:spark /var/server/spark/” -
删除 zookeeper leader 上的 /spark
进入leader zookeeper 目录
./zkCli.sh
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /
[aliases.json, autoscaling, autoscaling.json, clusterstate.json, collections, configs, hadoop-ha, hbase, hive_zookeeper_namespace, kafka, live_nodes, nifi, overseer, overseer_elect, security.json, solr, spark, zookeeper]
[zk: localhost:2181(CONNECTED) 1]
deleteall /spark
4.启动spark 集群
$SPARK_HOME/sbin/start-all.sh
测试一下workcount 程序,看看是否真的正常了。
[spark@spark-node1 test]$ ./wordcount-spark-test.sh
Running Spark using the REST application submission protocol.
18/11/20 00:59:25 INFO RestSubmissionClient: Submitting a request to launch an application in spark://10.20.2.31:6066,10.20.2.32:6066,10.20.2.33:6066,10.20.2.34:6066,10.20.2.35:6066.
18/11/20 00:59:25 INFO RestSubmissionClient: Submission successfully created as driver-20181120005925-0002. Polling submission state…
18/11/20 00:59:25 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20181120005925-0002 in spark://10.20.2.31:6066,10.20.2.32:6066,10.20.2.33:6066,10.20.2.34:6066,10.20.2.35:6066.
18/11/20 00:59:25 INFO RestSubmissionClient: State of driver driver-20181120005925-0002 is now SUBMITTED.
18/11/20 00:59:25 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
“action” : “CreateSubmissionResponse”,
“message” : “Driver successfully submitted as driver-20181120005925-0002”,
“serverSparkVersion” : “2.3.1”,
“submissionId” : “driver-20181120005925-0002”,
“success” : true
}
18/11/20 00:59:25 INFO ShutdownHookManager: Shutdown hook called