一、现象
Hadoop-2.7.2中,使用hadoop shell命令行读取文件内容时,针对大文件,会有如下报错,小文件则不会。
hadoop fs -cat /tmp/hue_database_dump4.json
16/09/29 15:13:37 WARN hdfs.DFSClient: Exception while reading from BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 of /tmp/hue_database_dump4.json from DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]
java.io.IOException: Incorrect value for packet payload size: 57616164
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)
at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
16/09/29 15:13:37 WARN hdfs.DFSClient: Exception while reading from BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 of /tmp/hue_database_dump4.json from DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK]
java.io.IOException: Incorrect value for packet payload size: 57616164
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)
at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
16/09/29 15:13:37 INFO hdfs.DFSClient: Could not obtain BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 from any node: java.io.IOException: No live nodes contain block BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 after checking nodes = [DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK], DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK] DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] Dead nodes: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]. Will get new block locations from namenode and retry...
16/09/29 15:13:37 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 1086.6056359410977 msec.
16/09/29 15:13:38 WARN hdfs.DFSClient: Exception while reading from BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 of /tmp/hue_database_dump4.json from DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]
java.io.IOException: Incorrect value for packet payload size: 57616164
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)
at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
16/09/29 15:13:38 WARN hdfs.DFSClient: Exception while reading from BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 of /tmp/hue_database_dump4.json from DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK]
java.io.IOException: Incorrect value for packet payload size: 57616164
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)
at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
16/09/29 15:13:38 INFO hdfs.DFSClient: Could not obtain BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 from any node: java.io.IOException: No live nodes contain block BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 after checking nodes = [DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK], DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK] DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK]
Dead nodes: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK]
DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]. Will get new block locations from namenode and retry...
16/09/29 15:13:38 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 IOException, will wait for 6199.040804985275 msec.
16/09/29 15:13:45 WARN hdfs.DFSClient: Exception while reading from BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 of /tmp/hue_database_dump4.json from DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]
java.io.IOException: Incorrect value for packet payload size: 57616164
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)
at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
16/09/29 15:13:45 INFO hdfs.DFSClient: Could not obtain BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 from any node: java.io.IOException: No live nodes contain block BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 after checking nodes = [DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK], DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK] Dead nodes: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]. Will get new block locations from namenode and retry...
16/09/29 15:13:45 WARN hdfs.DFSClient: DFS chooseDataNode: got # 3 IOException, will wait for 13991.541436655532 msec.
16/09/29 15:13:59 WARN hdfs.DFSClient: DFS Read
java.io.IOException: Incorrect value for packet payload size: 57616164
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:159)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:775)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:831)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:891)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107)
at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
cat: Incorrect value for packet payload size: 57616164
二、分析
仔细上述异常,主要得到两点信息:
1、Could not obtain BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 from any node: java.io.IOException: No live nodes contain block BP-1776288592-10.7.12.154-1468904160674:blk_1073998236_257465 after checking nodes = [DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK], DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK] Dead nodes: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]. Will get new block locations from namenode and retry...
2、Incorrect value for packet payload size: 57616164
逐个分析,首先看第一个,通过No live nodes contain block,找到相关代码位置,如下:
/**
* Get the best node from which to stream the data.
* @param block LocatedBlock, containing nodes in priority order.
* @param ignoredNodes Do not choose nodes in this array (may be null)
* @return The DNAddrPair of the best node.
* @throws IOException
*/
private DNAddrPair getBestNodeDNAddrPair(LocatedBlock block,
Collection<DatanodeInfo> ignoredNodes) throws IOException {
DatanodeInfo[] nodes = block.getLocations();
StorageType[] storageTypes = block.getStorageTypes();
DatanodeInfo chosenNode = null;
StorageType storageType = null;
if (nodes != null) {
for (int i = 0; i < nodes.length; i++) {
if (!deadNodes.containsKey(nodes[i])
&& (ignoredNodes == null || !ignoredNodes.contains(nodes[i]))) {
chosenNode = nodes[i];
// Storage types are ordered to correspond with nodes, so use the same
// index to get storage type.
if (storageTypes != null && i < storageTypes.length) {
storageType = storageTypes[i];
}
break;
}
}
}
if (chosenNode == null) {
throw new IOException("No live nodes contain block " + block.getBlock() +
" after checking nodes = " + Arrays.toString(nodes) +
", ignoredNodes = " + ignoredNodes);
}
final String dnAddr =
chosenNode.getXferAddr(dfsClient.getConf().connectToDnViaHostname);
if (DFSClient.LOG.isDebugEnabled()) {
DFSClient.LOG.debug("Connecting to datanode " + dnAddr);
}
InetSocketAddress targetAddr = NetUtils.createSocketAddr(dnAddr);
return new DNAddrPair(chosenNode, targetAddr, storageType);
}
但是这个异常后面的nodes = [DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK], DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]],和ignoredNodes = null,基本可以确定没有DataNode因为某些原因被排除和忽略。继续往下分析日志,得到如下关键信息:
Dead nodes: DatanodeInfoWithStorage[10.7.12.155:50010,DS-87568e9c-b339-4b0e-a09f-292118bcb752,DISK] DatanodeInfoWithStorage[10.7.12.156:50010,DS-80fb296c-1085-40ce-9dcf-e3a08327aa0d,DISK]
两个Node都被标记为Dead了,但是通过Web界面看两个DataNode都是正常的,如下:
这就有点奇怪了。只能继续往下看第二个异常输出:Incorrect value for packet payload size: 57616164,定位源码位置在PacketReceiver的doRead()方法中,如下:
if (totalLen < 0 || totalLen > MAX_PACKET_SIZE) {
throw new IOException("Incorrect value for packet payload size: " +
payloadLen);
}
也就是说,在接收数据包的过程中,数据包的总大小超过了阈值MAX_PACKET_SIZE,也就是16M,如下:
/**
* The max size of any single packet. This prevents OOMEs when
* invalid data is sent.
*/
private static final int MAX_PACKET_SIZE = 16 * 1024 * 1024;
而根据异常输出的57616164,这个数据包的大小达到了54M之多。由此可以想到,小文件本身就很小,数据包也不会大,而大文件数据包就比较大,会超过阈值。但是那个要读的文件,大小也就 50M,为什么会都出4M多呢?继续往下看数据包的结构,如下:
// Each packet looks like:
// PLEN HLEN HEADER CHECKSUMS DATA
// 32-bit 16-bit <protobuf> <variable length>
//
// PLEN: Payload length
// = length(PLEN) + length(CHECKSUMS) + length(DATA)
// This length includes its own encoded length in
// the sum for historical reasons.
//
// HLEN: Header length
// = length(HEADER)
//
// HEADER: the actual packet header fields, encoded in protobuf
// CHECKSUMS: the crcs for the data chunk. May be missing if
// checksums were not requested
// DATA the actual block data
Preconditions.checkState(curHeader == null || !curHeader.isLastPacketInBlock());
curPacketBuf.clear();
curPacketBuf.limit(PacketHeader.PKT_LENGTHS_LEN);
doReadFully(ch, in, curPacketBuf);
curPacketBuf.flip();
int payloadLen = curPacketBuf.getInt();
可以看到,这个数据包时包含Payload length,其值= length(PLEN) + length(CHECKSUMS) + length(DATA),还有Header length,其值= length(HEADER),最后是HEADER、CHECKSUMS和DATA,而除DATA外,其它的都是数据包额外添加的部分,这也就解释了为什么数据包比实际文件大小还大。
为什么数据包会如此之大呢?我们继续往下看数据包的发送过程,在BlockSender中数据包发送方法sendPacket()中,如下:
/**
* Sends a packet with up to maxChunks chunks of data.
*
* @param pkt buffer used for writing packet data
* @param maxChunks maximum number of chunks to send
* @param out stream to send data to
* @param transferTo use transferTo to send data
* @param throttler used for throttling data transfer bandwidth
*/
private int sendPacket(ByteBuffer pkt, int maxChunks, OutputStream out,
boolean transferTo, DataTransferThrottler throttler) throws IOException {
这个ByteBuffer pkt实际上就是存放待发送数据包数据的缓冲区,它的大小决定了发送数据包的大小,那么它的大小是如何设定的呢?继续分析,在doSendBlock()方法中,如下:
ByteBuffer pktBuf = ByteBuffer.allocate(pktBufSize);
while (endOffset > offset && !Thread.currentThread().isInterrupted()) {
manageOsCache();
long len = sendPacket(pktBuf, maxChunksPerPacket, streamForSendChunks,
transferTo, throttler);
offset += len;
totalRead += len + (numberOfChunks(len) * checksumSize);
seqno++;
}
先根据pktBufSize申请内存,确定缓冲区ByteBuffer pktBuf,然后再sendPacket()发送数据包。很明显,我们只需要知道pktBufSize是如何确定的就行了。如下:
if (transferTo) {
FileChannel fileChannel = ((FileInputStream)blockIn).getChannel();
blockInPosition = fileChannel.position();
streamForSendChunks = baseStream;
maxChunksPerPacket = numberOfChunks(TRANSFERTO_BUFFER_SIZE);
// Smaller packet size to only hold checksum when doing transferTo
pktBufSize += checksumSize * maxChunksPerPacket;
} else {
maxChunksPerPacket = Math.max(1,
numberOfChunks(HdfsConstants.IO_FILE_BUFFER_SIZE));
// Packet size includes both checksum and data
pktBufSize += (chunkSize + checksumSize) * maxChunksPerPacket;
}
这个pktBufSize与maxChunksPerPacket相关,而maxChunksPerPacket的大小确定方法如下:
maxChunksPerPacket = Math.max(1,
numberOfChunks(HdfsConstants.IO_FILE_BUFFER_SIZE));
也就是和参数IO_FILE_BUFFER_SIZE,即io.file.buffer.size有关,默认为4096,即4KB,如下:
public static final String IO_FILE_BUFFER_SIZE_KEY =
"io.file.buffer.size";
/** Default value for IO_FILE_BUFFER_SIZE_KEY */
public static final int IO_FILE_BUFFER_SIZE_DEFAULT = 4096;
而实际查看集群中的配置,如下:
足有125M......这也就是为什么数据包会如此之大的原因。
修改参数,重启节点后,读取数据正常。
三、答案
参数io.file.buffer.size配置过大,导致数据包发送超过数据包接收时设定的阈值。