spark 报错 Address already in use: Service ‘org.apache.spark.network.netty.NettyBlockTransferService‘

 INFO java.net.BindException: Address already in use: Service 'org.apache.spark.network.netty.NettyBlockTransferService' failed after 16 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'org.apache.spark.network.netty.NettyBlockTransferService' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address.
spark集群大量端口占用-BindException: Address already in use

1.异常信息

之前提交spark任务都很正常,但是最近老是执行spark任务失败:BindException: Address already in use

spark ui 显示 异常信息

 
  1. HTTP ERROR 500

  2. Problem accessing /proxy/application_1588486936385_2884/. Reason:

  3. Address already in use

  4. Caused by:

  5. java.net.BindException: Address already in use

  6. at java.net.PlainSocketImpl.socketBind(Native Method)

  7. at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)

  8. at java.net.Socket.bind(Socket.java:644)

  9. at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:120)

  10. at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)

  11. at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)

  12. at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)

  13. at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)

  14. at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)

  15. at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)

  16. at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)

  17. at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:200)

  18. at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:387)

  19. at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)

  20. at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)

  21. at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)

  22. at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)

yarn application 查看的异常信息

 
  1. 20/12/21 12:55:18 WARN Utils: Service 'org.apache.spark.network.netty.NettyBlockTransferService' could not bind on a random free port. You may check whether configuring an appropriate binding address.

  2. 20/12/21 12:55:18 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Unable to create executor due to Address already in use: Service 'org.apache.spark.network.netty.NettyBlockTransferService' failed after 100 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'org.apache.spark.network.netty.NettyBlockTransferService' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address.

  3. java.net.BindException: Address already in use: Service 'org.apache.spark.network.netty.NettyBlockTransferService' failed after 100 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'org.apache.spark.network.netty.NettyBlockTransferService' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address.

  4. at sun.nio.ch.Net.bind0(Native Method)

  5. at sun.nio.ch.Net.bind(Net.java:433)

  6. at sun.nio.ch.Net.bind(Net.java:425)

  7. at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)

  8. at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:128)

  9. at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)

  10. at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1283)

  11. at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)

  12. at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)

  13. at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:989)

  14. at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)

  15. at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:364)

  16. at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)

  17. at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)

  18. at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)

  19. at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)

  20. at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)

  21. at java.lang.Thread.run(Thread.java:745)

  22. End of LogType:stderr

2.异常分析

重试 这么多次都找不到随机端口,说明端口都被占用了

查看端口总数ss -s

 
  1. Total: 105626 (kernel 109563)

  2. TCP: 105277 (estab 196, closed 79, orphaned 0, synrecv 0, timewait 77/0), ports 0

  3. Transport Total IP IPv6

  4. * 109563 - -

  5. RAW 1 0 1

  6. UDP 15 8 7

  7. TCP 105198 104809 389

  8. INET 105214 104817 397

  9. FRAG 0 0 0

可以看到一万多个 基本 被占用完了

使用ss命令 大量的 CLOSE-WAIT 端口

 
  1. STAB 0 0 [::ffff:192.168.827]:44693 [::ffff:192.168.860]:37528

  2. CLOSE-WAIT 1 0 [::ffff:192.168.827]:58473 [::ffff:192.168.827]:50010

  3. CLOSE-WAIT 1 0 [::ffff:192.168.827]:55800 [::ffff:192.168.827]:50010

  4. CLOSE-WAIT 1 0 [::ffff:192.168.827]:37749 [::ffff:192.168.860]:50010

  5. CLOSE-WAIT 1 0 [::ffff:192.168.827]:54642 [::ffff:192.168.827]:50010

  6. CLOSE-WAIT 1 0 [::ffff:192.168.827]:39578 [::ffff:192.168.827]:50010

  7. CLOSE-WAIT 1 0

  8. 。。。。

随便找个端口分析一下

查看端口状态

 
  1. [root@master ~]# netstat -anp | grep 54889

  2. tcp 1 0 192.168.1.827:54889 192.168.1.803:50010 CLOSE_WAIT 19870/java

  3. tcp6 1 0 192.168.1.827:54889 192.168.1.827:50010 CLOSE_WAIT 44212/java

查看进程

 
  1. [root@master ~]# ps -ef|grep 19870

  2. root 17678 45042 0 16:34 pts/0 00:00:00 grep --color=auto 19870

  3. root 19870 1 0 May04 ? 1-10:31:14 /opt/hadoop/jdk1.8.0_77/bin/java -Xmx16384m -Djava.library.path=/opt/hadoop/hadoop-2.7.7/lib -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/opt/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/hadoop/hadoop-2.7.7 -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dproc_hiveserver2 -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.file=/opt/hadoop/apache-hive-3.0.0-bin/conf/parquet-logging.properties -Djline.terminal=jline.UnsupportedTerminal -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /opt/hadoop/apache-hive-3.0.0-bin/lib/hive-service-3.0.0.jar org.apache.hive.service.server.HiveServer2

查看 java 进程

 
  1. [root@master ~]# jps

  2. 28966 HQuorumPeer

  3. 11113 SecondaryNameNode

  4. 28457 HRegionServer

  5. 10858 DataNode

  6. 15722 NodeManager

  7. 11403 ResourceManager

  8. 10707 NameNode

  9. 44212 ApplicationMaster

  10. 2839 Jps

  11. 19672 RunJar

  12. 28217 HMaster

  13. 19870 RunJar

  14. 42814 RunJar

这个 RunJar应该就是HiveServer2

先查看对向端口

 
  1. [root@slave3 ~]# netstat -anp|grep 50010

  2. tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 11430/java

  3. tcp 0 0 192.168.1.859:50010 192.168.1.860:34366 ESTABLISHED 11430/java

  4. tcp 0 0 192.168.1.859:38024 192.168.1.803:50010 ESTABLISHED 11430/java

  5. tcp 0 0 192.168.1.859:50010 192.168.1.859:47796 ESTABLISHED 11430/java

  6. tcp 0 0 192.168.1.859:38022 192.168.1.803:50010 ESTABLISHED 11430/java

  7. tcp6 0 0 192.168.1.859:47796 192.168.1.859:50010 ESTABLISHED 29418/java

  8. [root@slave3 ~]# ps -ef|grep 11430

  9. root 11430 1 0 May03 ? 2-03:40:55 /opt/hadoop/jdk1.8.0_77/bin/java -Dproc_datanode -Xmx16384m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/opt/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/hadoop/hadoop-2.7.7 -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/hadoop/hadoop-2.7.7/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/opt/hadoop/hadoop-2.7.7/logs -Dhadoop.log.file=hadoop-root-datanode-slave3.log -Dhadoop.home.dir=/opt/hadoop/hadoop-2.7.7 -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/opt/hadoop/hadoop-2.7.7/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode

  10. root 11571 1179 0 16:46 pts/0 00:00:00 grep --color=auto 11430

  11. [root@slave3 ~]#

 
  1. [root@slave3 ~]#

  2. [root@slave3 ~]# jps

  3. 15488 HQuorumPeer

  4. 11665 Jps

  5. 5749 NodeManager

  6. 11430 DataNode

  7. 29418 HRegionServer

  8. [root@slave3 ~]#

嗯 这是个 dataNode

结论:HiveServer2和dataNode有大量连接没有关闭

3.解决异常

先kill 掉 HiveServer2这个进程,发现ss -s命令下端口占用减少,netstat命令 都快了 好多。(端口信息过多netstat就会卡,ss命令不会卡)

 
  1. [root@master ~]# ss -s

  2. Total: 983 (kernel 14276)

  3. TCP: 640 (estab 199, closed 84, orphaned 0, synrecv 0, timewait 82/0), ports 0

  4. Transport Total IP IPv6

  5. * 14276 - -

  6. RAW 1 0 1

  7. UDP 15 8 7

  8. TCP 556 155 401

  9. INET 572 163 409

  10. FRAG 0 0 0

假设1:我通过客户端连接过HiveServer2,但查询慢的时候我就直接关闭客户端,但是HiveServer2和datanode就没有关闭连接,但我可能一年就查几次,不会导致这么多连接没关闭吧,这个假设可能性太小

假设2:application 程序 有隐形的代码自动连接HiveServer2没有关闭连接,这个可能性也不大,我关闭HiveServer2后我的应用还是照常跑。

  • 7
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

未来星_狒狒

有问题随时交流Q9715234

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值