java.io.IOException: Couldn‘t set up IO streams: java.lang.IllegalArgumentException: KrbException

11 篇文章 1 订阅
10 篇文章 0 订阅

现象

datanode 运行一段时间后,先stale,最后dead

报错

查看datanode日志:

2021-12-18 07:36:09,868 ERROR datanode.DataNode (DataXceiver.java:writeBlock(869)) - DataNode{data=FSDataset{dirpath='[/data01/hadoop/hdfs/data, /data02/hadoop/hdfs/data, /data03/hadoop/hdfs/data, /data04/hadoop/hdfs/data, /data05/hadoop/hdfs/data, /data06/hadoop/hdfs/data, /data07/hadoop/hdfs/data, /data08/hadoop/hdfs/data, /data09/hadoop/hdfs/data, /data10/hadoop/hdfs/data, /data11/hadoop/hdfs/data, /data12/hadoop/hdfs/data]'}, localName='pass-bigdata-hadoop-007.chinatelecom.cn:1019', datanodeUuid='822c7ed7-91f1-40e5-a451-de64565933fb', xmitsInProgress=0}:Exception transfering block BP-1110108019-10.218.12.10-1632992796526:blk_1483761889_411995504 to mirror 10.218.12.45:1019
java.io.EOFException: Unexpected EOF while trying to read response from server
        at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:549)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:842)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
        at java.lang.Thread.run(Thread.java:748)
2021-12-18 07:36:09,868 INFO  datanode.DataNode (DataXceiver.java:writeBlock(928)) - opWriteBlock BP-1110108019-10.218.12.10-1632992796526:blk_1483761889_411995504 received exception java.io.EOFException: Unexpected EOF while trying to read response from server
2021-12-18 07:36:09,868 INFO  datanode.DataNode (DataXceiver.java:run(323)) - java.io.EOFException: Unexpected EOF while trying to read response from server
2021-12-18 07:36:09,876 INFO  datanode.DataNode (DataXceiver.java:writeBlock(744)) - Receiving BP-1110108019-10.218.12.10-1632992796526:blk_1483761929_411995545 src: /10.218.12.16:50294 dest: /10.218.12.16:1019
2021-12-18 07:36:09,881 INFO  datanode.DataNode (DataXceiver.java:writeBlock(744)) - Receiving BP-1110108019-10.218.12.10-1632992796526:blk_1483761953_411995569 src: /10.218.12.16:50298 dest: /10.218.12.16:1019
...
2021-12-18 07:36:10,008 INFO  datanode.DataNode (DataXceiver.java:writeBlock(744)) - Receiving BP-1110108019-10.218.12.10-1632992796526:blk_1483762447_411996081 src: /10.218.12.16:50364 dest: /10.218.12.16:1019
2021-12-18 07:36:10,021 INFO  datanode.DataNode (DataXceiver.java:writeBlock(744)) - Receiving BP-1110108019-10.218.12.10-1632992796526:blk_1483762493_411996128 src: /10.218.12.16:50368 dest: /10.218.12.16:1019
2021-12-18 07:36:10,027 INFO  datanode.DataNode (DataXceiver.java:writeBlock(744)) - Receiving BP-1110108019-10.218.12.10-1632992796526:blk_1483762529_411996164 src: /10.218.12.16:50372 dest: /10.218.12.16:1019
2021-12-18 07:36:10,028 INFO  datanode.DataNode (DataXceiver.java:writeBlock(744)) - Receiving BP-1110108019-10.218.12.10-1632992796526:blk_1483762539_411996174 src: /10.218.12.16:50376 dest: /10.218.12.16:1019
2021-12-18 07:36:10,032 INFO  datanode.DataNode (DataXceiver.java:writeBlock(744)) - Receiving BP-1110108019-10.218.12.10-1632992796526:blk_1483760722_411994334 src: /10.218.12.34:49062 dest: /10.218.12.16:1019
2021-12-18 07:36:10,047 INFO  datanode.DataNode (BlockReceiver.java:receiveBlock(1010)) - Exception for BP-1110108019-10.218.12.10-1632992796526:blk_1483758419_411992031
java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:212)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:211)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:528)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:971)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:897)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:173)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:107)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
        at java.lang.Thread.run(Thread.java:748)
2021-12-18 07:36:10,048 INFO  datanode.DataNode (BlockReceiver.java:run(1470)) - PacketResponder: BP-1110108019-10.218.12.10-1632992796526:blk_1483758419_411992031, type=LAST_IN_PIPELINE: Thread is interrupted.
2021-12-18 07:36:10,048 INFO  datanode.DataNode (BlockReceiver.java:run(1506)) - PacketResponder: BP-1110108019-10.218.12.10-1632992796526:blk_1483758419_411992031, type=LAST_IN_PIPELINE terminating
2021-12-18 07:36:10,048 INFO  datanode.DataNode (DataXceiver.java:writeBlock(928)) - opWriteBlock BP-1110108019-10.218.12.10-1632992796526:blk_1483758419_411992031 received exception java.io.IOException: Premature EOF from inputStream
2021-12-18 07:36:10,048 INFO  datanode.DataNode (DataXceiver.java:run(323)) - java.io.IOException: Premature EOF from inputStream
...
2021-12-18 07:36:10,048 INFO  datanode.DataNode (DataXceiver.java:run(323)) - java.io.IOException: Premature EOF from inputStream
2021-12-18 07:36:10,050 INFO  datanode.DataNode (DataXceiver.java:writeBlock(744)) - Receiving BP-1110108019-10.218.12.10-1632992796526:blk_1483762633_411996270 src: /10.218.12.16:50386 dest: /10.218.12.16:1019
2021-12-18 07:36:10,052 INFO  datanode.DataNode (DataXceiver.java:writeBlock(744)) - Receiving BP-1110108019-10.218.12.10-1632992796526:blk_1483762642_411996279 src: /10.218.12.16:50390 dest: /10.218.12.16:1019
2021-12-18 07:36:10,053 INFO  datanode.DataNode (DataXceiver.java:writeBlock(744)) - Receiving BP-1110108019-10.218.12.10-1632992796526:blk_1483762645_411996282 src: /10.218.12.16:50392 dest: /10.218.12.16:1019
2021-12-18 07:36:10,054 INFO  datanode.DataNode (BlockReceiver.java:run(1454)) - PacketResponder: BP-1110108019-10.218.12.10-1632992796526:blk_1483762633_411996270, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=2:[10.218.12.43:1019, 10.218.12.45:1019]
java.io.EOFException: Unexpected EOF while trying to read response from server
        at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:549)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1384)
        at java.lang.Thread.run(Thread.java:748)
2021-12-18 07:36:10,055 INFO  datanode.DataNode (DataXceiver.java:writeBlock(744)) - Receiving BP-1110108019-10.218.12.10-1632992796526:blk_1483758419_411992031 src: /10.218.12.13:34994 dest: /10.218.12.16:1019
2021-12-18 07:36:10,055 INFO  impl.FsDatasetImpl (FsDatasetImpl.java:recoverRbw(1440)) - Recover RBW replica BP-1110108019-10.218.12.10-1632992796526:blk_1483758419_411992031

WARN  datanode.DataNode (BPServiceActor.java:offerService(728)) - IOException in offerService
java.io.IOException: DestHost:destPort pass-bigdata-hadoop-001.chinatelecom.cn:8020 , LocalHost:localPort pass-bigdata-hadoop-009.chinatelecom.cn/10.218.12.18:0. Failed on local exception: java.io.IOException: Couldn't set up IO streams: java.lang.IllegalArgumentException: KrbException: Cannot locate default realm
        at sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1501)
        at org.apache.hadoop.ipc.Client.call(Client.java:1443)
        at org.apache.hadoop.ipc.Client.call(Client.java:1353)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy17.sendHeartbeat(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:166)
		 at javax.security.auth.kerberos.KerberosPrincipal.<init>(KerberosPrincipal.java:154)
        at org.apache.hadoop.security.SaslRpcClient.getServerPrincipal(SaslRpcClient.java:305)
        at org.apache.hadoop.security.SaslRpcClient.createSaslClient(SaslRpcClient.java:234)
        at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:160)
        at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:390)
        at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:614)
        at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:410)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:800)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:796)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:796)
        ... 12 more
2021-12-19 18:09:19,162 WARN  datanode.DataNode (BPServiceActor.java:offerService(728)) - IOException in offerService
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:514)
		at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1501)
        at org.apache.hadoop.ipc.Client.call(Client.java:1443)
        at org.apache.hadoop.ipc.Client.call(Client.java:1353)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy17.blockReceivedAndDeleted(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolClientSideTranslatorPB.java:265)
        at org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager.sendIBRs(IncrementalBlockReportManager.java:212)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:687)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:842)
        at java.lang.Thread.run(Thread.java:748)


原因分析

因为是在Hadoop集群下跑hive或者yarn调度任务,如果被Hadoop RPC创建的线程数目达到节点设置的ulimit -u(可以在节点控制台输入ulimit -u查看具体的值)的值,Java就会将这个作为内存溢出异常。主要是每次对表的操作都要走kerberos,打开kerberos文件太多了

解决

1、修改参数(虽然解决问题,但是不能解决根本问题)
hdfs_user_nofile_limit65536>655360 这参数是too many open files
hdfs_user_nproc_limit65536>655360 主要是此参数引起的

2、应该在调度一个任务是只打开一个kerberos认证文件,而不是每次在一个调度任务中,没得线程打开一个kerberos认证文件,或者一个租户使用一个kerberos认证,而不是一个租户每次调动一个任务,认证一次kerberos

扩展

调整以上参数依然为解决问题

查看hdfs datnanode程序使用的open file 连接数

 for i in {004..60} ;do echo pass-bigdata-hadoop-$i;ssh pass-bigdata-hadoop-$i cat /proc/`ssh pass-bigdata-hadoop-$i cat /var/run/hadoop/hdfs/hadoop-hdfs-root-datanode.pid`/limits |grep "Max open files"; done

可以看出max open file 是4096
这个值实际是与ulimit -n 保持一致
在这里插入图片描述

解决

重启ambari-agent 再重启datanode程序

ambari-agent restart

在这里插入图片描述

原因

之前我们做过破坏测试,服务没停,导致机器重启之后,没有来的及加载配置,ambari-agent启动了,hdp datanode收到制约导致open file 有问题,手动重启后问题解决
这样重启ambri-agent 也是不是办法,所以想为啥重启会有这个问题
仔细查看/etc/security/limits.conf文件的注释,说明了对系统服务不生效
在这里插入图片描述

终极解决

参考:https://blog.csdn.net/vic_qxz/article/details/80890988
在Centos7系统中,使用Systemd替代了之前的SysV。/etc/security/limits.conf文件的配置作用域缩小了。/etc/security/limits.conf的配置,只适用于通过PAM认证登录用户的资源限制,它对systemd的service的资源限制不生效。因此登录用户的限制,通过/etc/security/limits.conf与/etc/security/limits.d下的文件设置即可。

对于systemd service的资源设置,则需修改全局配置,全局配置文件放在/etc/systemd/system.conf和/etc/systemd/user.conf,同时也会加载两个对应目录中的所有.conf文件/etc/systemd/system.conf.d/.conf和/etc/systemd/user.conf.d/.conf。system.conf是系统实例使用的,user.conf是用户实例使用的。

vim /etc/systemd/system.conf
DefaultLimitNOFILE=100000
DefaultLimitNPROC=65535

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值