- 背景:Spark程序读取hdfs中数据,进行数据解析,最终将符合需求数据落地hdfs特定目录。程序开发有段时间了,今天突发异常,报错日志如下
java.io.IOException: Premature EOF reading from org.apache.hadoop.net.SocketInputStream@2bf2ef2c at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:260) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102) at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:170) at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:135) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:642)
后台nohup执行,报错日志只显示是IO异常,一直以为是内存问题,以至重复多次还是抛异常,emm~ 最终在spark前端页面发现
INFO hdfs.DFSClient: Could not obtain BP-897981742-***-14073765905:blk_1083989588_10255609 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry... WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 1890.2449319554266 msec.
hdfs数据块block丢失,hdfs50070页面显示:blk_1083989588只有一块,导致spark程序异常
-
目前问题的解决办法可以把所有数据的副本改为1个;或者将这个数据库的数据丢掉
spark-hdfs问题解析
最新推荐文章于 2020-09-23 22:37:06 发布