Spark报错Service ‘sparkWorker‘ failed after 16 retries (starting from 0)记录

问题描述

下方是我找问题的思路,如果您赶时间,请直接翻到文章最末查看即可。

启动Spark时,只有Master启动成功,其他两台Worker未启动成功
查看master主机启动日志,没有问题,查看slave1和slave2的spark启动日志,发现如下:

Spark Command: /usr/apps/jdk1.8.0/bin/java -cp /usr/apps/spark-2.1.1/conf/:/usr/apps/spark-2.1.1/jars/*:/usr/apps/hadoop-2.7.7/etc/hadoop/ -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master:7077
========================================
21/10/19 00:17:43 INFO worker.Worker: Started daemon with process name: 13713@slave1
21/10/19 00:17:43 INFO util.SignalUtils: Registered signal handler for TERM
21/10/19 00:17:43 INFO util.SignalUtils: Registered signal handler for HUP
21/10/19 00:17:43 INFO util.SignalUtils: Registered signal handler for INT
21/10/19 00:17:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/10/19 00:17:44 INFO spark.SecurityManager: Changing view acls to: root
21/10/19 00:17:44 INFO spark.SecurityManager: Changing modify acls to: root
21/10/19 00:17:44 INFO spark.SecurityManager: Changing view acls groups to:
21/10/19 00:17:44 INFO spark.SecurityManager: Changing modify acls groups to:
21/10/19 00:17:44 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
Exception in thread "main" java.net.BindException: 无法指定被请求的地址: Service 'sparkWorker' failed after 16 retries (starting from 0)! Consider explicitly setting the appropriate port for the service 'sparkWorker' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:438)
        at sun.nio.ch.Net.bind(Net.java:430)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:225)
        at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:127)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:501)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1218)
        at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506)
        at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491)
        at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:965)
        at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:210)
        at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:353)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:408)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:455)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:748)
  1. 启动方式:在master主机上Spark目录中sbin下./start-all.sh命令启动,日志输出如下:
[root@master spark-2.1.1]# sbin/start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /usr/apps/spark-2.1.1/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
slave2: starting org.apache.spark.deploy.worker.Worker, logging to /usr/apps/spark-2.1.1/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /usr/apps/spark-2.1.1/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out

查找网络中修改案例

基本都是修改spark-env.sh中的export SPARK_LOCAL_IP=
但是!我之前使用的export SPARK_MASTER_IP=master是没有问题的,可以正常启动
断定不是该问题

查找问题思路

  1. 查看spark-env.sh中有无因为手速过快拼写错误,例如:slave1拼写成:salve1
  2. 查看/etc/hosts文件中是否有误
  3. 查看Spark配置文件夹下的slaves文件,看是否有拼写错误。

问题解决

发现开始配置主机名称:hostnamectl set-hostname 主机名的时候,错把Slave1打成了Slave2,Slave2打成了Slave1

由此可见

Spark启动和主机名是有关系的,请查看主机名是否有误,IP和主机名是否对应。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

AdminLog

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值