HDFS读取文件失败Incorrect value for packet payload size

一、现象

        Hadoop-2.7.2中,使用hadoop shell命令行读取文件内容时,针对大文件,会有如下报错,小文件则不会。

hadoop fs -cat    /tmp/hue_database_dump4.json                          
16/09/29 15:13:37 WARN hdfs.DFSClient: Exception while reading from BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 of /tmp/hue_database_dump4.json from DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]
java.io.IOException: Incorrect value for packet payload size: 57616164
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
        at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
        at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
        at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
        at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
        at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)
        at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
        at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
        at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
        at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
16/09/29 15:13:37 WARN hdfs.DFSClient: Exception while reading from BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 of /tmp/hue_database_dump4.json from DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK]
java.io.IOException: Incorrect value for packet payload size: 57616164
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
        at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
        at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
        at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
        at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
        at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)
        at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
        at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
        at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
        at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
16/09/29 15:13:37 INFO hdfs.DFSClient: Could not obtain BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 from any node: java.io.IOException: No live nodes contain block BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 after checking nodes = [DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK], DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK] DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] Dead nodes:  DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]. Will get new block locations from namenode and retry...
16/09/29 15:13:37 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 1086.6056359410977 msec.
16/09/29 15:13:38 WARN hdfs.DFSClient: Exception while reading from BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 of /tmp/hue_database_dump4.json from DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]
java.io.IOException: Incorrect value for packet payload size: 57616164
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
        at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
        at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
        at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
        at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
        at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)
        at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
        at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
        at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
        at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
16/09/29 15:13:38 WARN hdfs.DFSClient: Exception while reading from BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 of /tmp/hue_database_dump4.json from DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK]
java.io.IOException: Incorrect value for packet payload size: 57616164
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
        at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
        at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
        at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
        at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
        at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)
        at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
        at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
        at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
        at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
16/09/29 15:13:38 INFO hdfs.DFSClient: Could not obtain BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 from any node: java.io.IOException: No live nodes contain block BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 after checking nodes = [DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK], DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK] DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] 
Dead nodes:  DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] 
DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]. Will get new block locations from namenode and retry...
16/09/29 15:13:38 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 IOException, will wait for 6199.040804985275 msec.
16/09/29 15:13:45 WARN hdfs.DFSClient: Exception while reading from BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 of /tmp/hue_database_dump4.json from DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]
java.io.IOException: Incorrect value for packet payload size: 57616164
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
        at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
        at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
        at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
        at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
        at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)
        at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
        at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
        at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
        at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
16/09/29 15:13:45 INFO hdfs.DFSClient: Could not obtain BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 from any node: java.io.IOException: No live nodes contain block BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 after checking nodes = [DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK], DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK] Dead nodes:  DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]. Will get new block locations from namenode and retry...
16/09/29 15:13:45 WARN hdfs.DFSClient: DFS chooseDataNode: got # 3 IOException, will wait for 13991.541436655532 msec.
16/09/29 15:13:59 WARN hdfs.DFSClient: DFS Read
java.io.IOException: Incorrect value for packet payload size: 57616164
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
        at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
        at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
        at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
        at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
        at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)
        at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
        at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
        at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
        at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
cat: Incorrect value for packet payload size: 57616164

二、分析

        仔细上述异常,主要得到两点信息:

        1、Could not obtain BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 from any node: java.io.IOException: No live nodes contain block BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 after checking nodes = [DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK], DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK] Dead nodes:  DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]. Will get new block locations from namenode and retry...

        2、Incorrect value for packet payload size: 57616164

        逐个分析,首先看第一个,通过No live nodes contain block,找到相关代码位置,如下:

  /**
   * Get the best node from which to stream the data.
   * @param block LocatedBlock, containing nodes in priority order.
   * @param ignoredNodes Do not choose nodes in this array (may be null)
   * @return The DNAddrPair of the best node.
   * @throws IOException
   */
  private DNAddrPair getBestNodeDNAddrPair(LocatedBlock block,
      Collection<DatanodeInfo> ignoredNodes) throws IOException {
    DatanodeInfo[] nodes = block.getLocations();
    StorageType[] storageTypes = block.getStorageTypes();
    DatanodeInfo chosenNode = null;
    StorageType storageType = null;
    if (nodes != null) {
      for (int i = 0; i < nodes.length; i++) {
        if (!deadNodes.containsKey(nodes[i])
            && (ignoredNodes == null || !ignoredNodes.contains(nodes[i]))) {
          chosenNode = nodes[i];
          // Storage types are ordered to correspond with nodes, so use the same
          // index to get storage type.
          if (storageTypes != null && i < storageTypes.length) {
            storageType = storageTypes[i];
          }
          break;
        }
      }
    }
    if (chosenNode == null) {
      throw new IOException("No live nodes contain block " + block.getBlock() +
          " after checking nodes = " + Arrays.toString(nodes) +
          ", ignoredNodes = " + ignoredNodes);
    }
    final String dnAddr =
        chosenNode.getXferAddr(dfsClient.getConf().connectToDnViaHostname);
    if (DFSClient.LOG.isDebugEnabled()) {
      DFSClient.LOG.debug("Connecting to datanode " + dnAddr);
    }
    InetSocketAddress targetAddr = NetUtils.createSocketAddr(dnAddr);
    return new DNAddrPair(chosenNode, targetAddr, storageType);
  }
        但是这个异常后面的nodes = [DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK], DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]],和ignoredNodes = null,基本可以确定没有DataNode因为某些原因被排除和忽略。继续往下分析日志,得到如下关键信息:

Dead nodes:  DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]
        两个Node都被标记为Dead了,但是通过Web界面看两个DataNode都是正常的,如下:


        这就有点奇怪了。只能继续往下看第二个异常输出:Incorrect value for packet payload size: 57616164,定位源码位置在PacketReceiver的doRead()方法中,如下:

    if (totalLen < 0 || totalLen > MAX_PACKET_SIZE) {
      throw new IOException("Incorrect value for packet payload size: " +
                            payloadLen);
    }
        也就是说,在接收数据包的过程中,数据包的总大小超过了阈值MAX_PACKET_SIZE,也就是16M,如下:

  /**
   * The max size of any single packet. This prevents OOMEs when
   * invalid data is sent.
   */
  private static final int MAX_PACKET_SIZE = 16 * 1024 * 1024;
        而根据异常输出的57616164,这个数据包的大小达到了54M之多。由此可以想到,小文件本身就很小,数据包也不会大,而大文件数据包就比较大,会超过阈值。但是那个要读的文件,大小也就 50M,为什么会都出4M多呢?继续往下看数据包的结构,如下:

    // Each packet looks like:
    //   PLEN    HLEN      HEADER     CHECKSUMS  DATA
    //   32-bit  16-bit   <protobuf>  <variable length>
    //
    // PLEN:      Payload length
    //            = length(PLEN) + length(CHECKSUMS) + length(DATA)
    //            This length includes its own encoded length in
    //            the sum for historical reasons.
    //
    // HLEN:      Header length
    //            = length(HEADER)
    //
    // HEADER:    the actual packet header fields, encoded in protobuf
    // CHECKSUMS: the crcs for the data chunk. May be missing if
    //            checksums were not requested
    // DATA       the actual block data
    Preconditions.checkState(curHeader == null || !curHeader.isLastPacketInBlock());

    curPacketBuf.clear();
    curPacketBuf.limit(PacketHeader.PKT_LENGTHS_LEN);
    doReadFully(ch, in, curPacketBuf);
    curPacketBuf.flip();
    int payloadLen = curPacketBuf.getInt();
       可以看到,这个数据包时包含Payload length,其值= length(PLEN) + length(CHECKSUMS) + length(DATA),还有Header length,其值= length(HEADER),最后是HEADER、CHECKSUMS和DATA,而除DATA外,其它的都是数据包额外添加的部分,这也就解释了为什么数据包比实际文件大小还大。

        为什么数据包会如此之大呢?我们继续往下看数据包的发送过程,在BlockSender中数据包发送方法sendPacket()中,如下:

  /**
   * Sends a packet with up to maxChunks chunks of data.
   * 
   * @param pkt buffer used for writing packet data
   * @param maxChunks maximum number of chunks to send
   * @param out stream to send data to
   * @param transferTo use transferTo to send data
   * @param throttler used for throttling data transfer bandwidth
   */
  private int sendPacket(ByteBuffer pkt, int maxChunks, OutputStream out,
      boolean transferTo, DataTransferThrottler throttler) throws IOException {
        这个ByteBuffer pkt实际上就是存放待发送数据包数据的缓冲区,它的大小决定了发送数据包的大小,那么它的大小是如何设定的呢?继续分析,在doSendBlock()方法中,如下:

      ByteBuffer pktBuf = ByteBuffer.allocate(pktBufSize);

      while (endOffset > offset && !Thread.currentThread().isInterrupted()) {
        manageOsCache();
        long len = sendPacket(pktBuf, maxChunksPerPacket, streamForSendChunks,
            transferTo, throttler);
        offset += len;
        totalRead += len + (numberOfChunks(len) * checksumSize);
        seqno++;
      }
        先根据pktBufSize申请内存,确定缓冲区ByteBuffer pktBuf,然后再sendPacket()发送数据包。很明显,我们只需要知道pktBufSize是如何确定的就行了。如下:

      if (transferTo) {
        FileChannel fileChannel = ((FileInputStream)blockIn).getChannel();
        blockInPosition = fileChannel.position();
        streamForSendChunks = baseStream;
        maxChunksPerPacket = numberOfChunks(TRANSFERTO_BUFFER_SIZE);
        
        // Smaller packet size to only hold checksum when doing transferTo
        pktBufSize += checksumSize * maxChunksPerPacket;
      } else {
        maxChunksPerPacket = Math.max(1,
            numberOfChunks(HdfsConstants.IO_FILE_BUFFER_SIZE));
        // Packet size includes both checksum and data
        pktBufSize += (chunkSize + checksumSize) * maxChunksPerPacket;
      }
        这个pktBufSize与maxChunksPerPacket相关,而maxChunksPerPacket的大小确定方法如下:

maxChunksPerPacket = Math.max(1,
            numberOfChunks(HdfsConstants.IO_FILE_BUFFER_SIZE));
        也就是和参数IO_FILE_BUFFER_SIZE,即io.file.buffer.size有关,默认为4096,即4KB,如下:

  public static final String  IO_FILE_BUFFER_SIZE_KEY =
    "io.file.buffer.size";
  /** Default value for IO_FILE_BUFFER_SIZE_KEY */
  public static final int     IO_FILE_BUFFER_SIZE_DEFAULT = 4096;
        而实际查看集群中的配置,如下:


        足有125M......这也就是为什么数据包会如此之大的原因。

        修改参数,重启节点后,读取数据正常。

三、答案

        参数io.file.buffer.size配置过大,导致数据包发送超过数据包接收时设定的阈值。

  • 3
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值