问题描述
下方是我找问题的思路,如果您赶时间,请直接翻到文章最末查看即可。
启动Spark时,只有Master
启动成功,其他两台Worker
未启动成功
查看master主机启动日志,没有问题,查看slave1和slave2的spark启动日志,发现如下:
Spark Command: /usr/apps/jdk1.8.0/bin/java -cp /usr/apps/spark-2.1.1/conf/:/usr/apps/spark-2.1.1/jars/*:/usr/apps/hadoop-2.7.7/etc/hadoop/ -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://master:7077
========================================
21/10/19 00:17:43 INFO worker.Worker: Started daemon with process name: 13713@slave1
21/10/19 00:17:43 INFO util.SignalUtils: Registered signal handler for TERM
21/10/19 00:17:43 INFO util.SignalUtils: Registered signal handler for HUP
21/10/19 00:17:43 INFO util.SignalUtils: Registered signal handler for INT
21/10/19 00:17:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/10/19 00:17:44 INFO spark.SecurityManager: Changing view acls to: root
21/10/19 00:17:44 INFO spark.SecurityManager: Changing modify acls to: root
21/10/19 00:17:44 INFO spark.SecurityManager: Changing view acls groups to:
21/10/19 00:17:44 INFO spark.SecurityManager: Changing modify acls groups to:
21/10/19 00:17:44 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
21/10/19 00:17:45 WARN util.Utils: Service 'sparkWorker' could not bind on port 0. Attempting port 1.
Exception in thread "main" java.net.BindException: 无法指定被请求的地址: Service 'sparkWorker' failed after 16 retries (starting from 0)! Consider explicitly setting the appropriate port for the service 'sparkWorker' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:438)
at sun.nio.ch.Net.bind(Net.java:430)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:225)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:127)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:501)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1218)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:965)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:210)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:353)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:408)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:455)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:748)
- 启动方式:在
master
主机上Spark目录中sbin下./start-all.sh
命令启动,日志输出如下:
[root@master spark-2.1.1]# sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/apps/spark-2.1.1/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
slave2: starting org.apache.spark.deploy.worker.Worker, logging to /usr/apps/spark-2.1.1/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /usr/apps/spark-2.1.1/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out
查找网络中修改案例
基本都是修改
spark-env.sh
中的export SPARK_LOCAL_IP=
但是!我之前使用的export SPARK_MASTER_IP=master
是没有问题的,可以正常启动
断定不是该问题
查找问题思路
- 查看
spark-env.sh
中有无因为手速过快拼写错误,例如:slave1
拼写成:salve1
- 查看
/etc/hosts
文件中是否有误 - 查看Spark配置文件夹下的
slaves
文件,看是否有拼写错误。
问题解决
发现开始配置主机名称:
hostnamectl set-hostname 主机名
的时候,错把Slave1打成了Slave2,Slave2打成了Slave1
由此可见
Spark启动和主机名是有关系的,请查看主机名是否有误,IP和主机名是否对应。