前面我们将了两个BlockReader接口实现类的read函数,这两个接口实现类分别为BlockReaderLocal类和BlockReaderLocalLegacy类,这两个类都是基于本地读的,就是客户端和datanode在同一台服务器上。而RemoteBlockReader2类是基于UNIX domain socket或者TCP socket,具体是哪一个依赖于配置文件。接下来我们开始分析RemoteBlockReader2类中的read函数,代码如下:
@Override
public int read(ByteBuffer buf) throws IOException {
if (curDataSlice == null || curDataSlice.remaining() == 0 && bytesNeededToFinish > 0) {
//读取下一个数据包,将数据包中的数据部分存入curDataSlice中
readNextPacket();
}
//如果curDataSlice没有数据,那么说明数据文件中没有了数据
if (curDataSlice.remaining() == 0) {
// we're at EOF now
return -1;
}
//获取curDataSlice中limit-position和buf中limit-position中的最小值
int nRead = Math.min(curDataSlice.remaining(), buf.remaining());
//创建一个新的ByteBuffer类对象和curDataSlice共享同一份数据
ByteBuffer writeSlice = curDataSlice.duplicate();
writeSlice.limit(writeSlice.position() + nRead);
//将curDataSlice中的数据拷贝到buf中
buf.put(writeSlice);
curDataSlice.position(writeSlice.position());
return nRead;
}
readNextPacket函数代码如下:
private void readNextPacket() throws IOException {
//Read packet headers.
//调用packetReceiver从IO流中读取一个新的数据包
packetReceiver.receiveNextPacket(in);
//将数据包头读入curHeader变量中,将数据包数据写入curDataSlice中
PacketHeader curHeader = packetReceiver.getHeader();
curDataSlice = packetReceiver.getDataSlice();
assert curDataSlice.capacity() == curHeader.getDataLen();
if (LOG.isTraceEnabled()) {
LOG.trace("DFSClient readNextPacket got header " + curHeader);
}
// Sanity check the lengths
//检查头域中的长度
if (!curHeader.sanityCheck(lastSeqNo)) {
throw new IOException("BlockReader: error in packet header " +
curHeader);
}
//检查数据和校验和是否匹配
if (curHeader.getDataLen() > 0) {
int chunks = 1 + (curHeader.getDataLen() - 1) / bytesPerChecksum;
int checksumsLen = chunks * checksumSize;
assert packetReceiver.getChecksumSlice().capacity() == checksumsLen :
"checksum slice capacity=" + packetReceiver.getChecksumSlice().capacity() +
" checksumsLen=" + checksumsLen;
lastSeqNo = curHeader.getSeqno();
if (verifyChecksum && curDataSlice.remaining() > 0) {
// N.B.: the checksum error offset reported here is actually
// relative to the start of the block, not the start of the file.
// This is slightly misleading, but preserves the behavior from
// the older BlockReader.
checksum.verifyChunkedSums(curDataSlice,
packetReceiver.getChecksumSlice(),
filename, curHeader.getOffsetInBlock());
}
bytesNeededToFinish -= curHeader.getDataLen();
}
// First packet will include some data prior to the first byte
// the user requested. Skip it.
if (curHeader.getOffsetInBlock() < startOffset) {
int newPos = (int) (startOffset - curHeader.getOffsetInBlock());
curDataSlice.position(newPos);
}
// If we've now satisfied the whole client read, read one last packet
// header, which should be empty
//如果完成了客户端的整个读取操作,读取最后一个空的数据包,因为数据块的最后一个数据包为空的标识数据包
if (bytesNeededToFinish <= 0) {
readTrailingEmptyPacket();
if (verifyChecksum) {
sendReadResult(Status.CHECKSUM_OK);
} else {
sendReadResult(Status.SUCCESS);
}
}
}
接下来我们分析一下函数receiveNextPacket,代码如下:
/**
* Reads all of the data for the next packet into the appropriate buffers.
*
* The data slice and checksum slice members will be set to point to the
* user data and corresponding checksums. The header will be parsed and
* set.
*/
public void receiveNextPacket(ReadableByteChannel in) throws IOException {
doRead(in, null);
}
继续往下,分析doRead函数(注意该函数的第二个参数为null),代码如下:
private void doRead(ReadableByteChannel ch, InputStream in)
throws IOException {
// Each packet looks like:
// PLEN HLEN HEADER CHECKSUMS DATA
// 32-bit 16-bit <protobuf> <variable length>
//
// PLEN: Payload length
// = length(PLEN) + length(CHECKSUMS) + length(DATA)
// This length includes its own encoded length in
// the sum for historical reasons.
//
// HLEN: Header length
// = length(HEADER)
//
// HEADER: the actual packet header fields, encoded in protobuf
// CHECKSUMS: the crcs for the data chunk. May be missing if
// checksums were not requested
// DATA the actual block data
Preconditions.checkState(curHeader == null || !curHeader.isLastPacketInBlock());
curPacketBuf.clear();
curPacketBuf.limit(PacketHeader.PKT_LENGTHS_LEN);
//读取数据
doReadFully(ch, in, curPacketBuf);
curPacketBuf.flip();
int payloadLen = curPacketBuf.getInt();
if (payloadLen < Ints.BYTES) {
// The "payload length" includes its own length. Therefore it
// should never be less than 4 bytes
throw new IOException("Invalid payload length " +
payloadLen);
}
int dataPlusChecksumLen = payloadLen - Ints.BYTES;
int headerLen = curPacketBuf.getShort();
if (headerLen < 0) {
throw new IOException("Invalid header length " + headerLen);
}
if (LOG.isTraceEnabled()) {
LOG.trace("readNextPacket: dataPlusChecksumLen = " + dataPlusChecksumLen +
" headerLen = " + headerLen);
}
// Sanity check the buffer size so we don't allocate too much memory
// and OOME.
int totalLen = payloadLen + headerLen;
if (totalLen < 0 || totalLen > MAX_PACKET_SIZE) {
throw new IOException("Incorrect value for packet payload size: " +
payloadLen);
}
// Make sure we have space for the whole packet, and
// read it.
reallocPacketBuf(PacketHeader.PKT_LENGTHS_LEN +
dataPlusChecksumLen + headerLen);
curPacketBuf.clear();
curPacketBuf.position(PacketHeader.PKT_LENGTHS_LEN);
curPacketBuf.limit(PacketHeader.PKT_LENGTHS_LEN +
dataPlusChecksumLen + headerLen);
doReadFully(ch, in, curPacketBuf);
curPacketBuf.flip();
curPacketBuf.position(PacketHeader.PKT_LENGTHS_LEN);
// Extract the header from the front of the buffer (after the length prefixes)
byte[] headerBuf = new byte[headerLen];
curPacketBuf.get(headerBuf);
if (curHeader == null) {
curHeader = new PacketHeader();
}
curHeader.setFieldsFromData(dataPlusChecksumLen, headerBuf);
// Compute the sub-slices of the packet
int checksumLen = dataPlusChecksumLen - curHeader.getDataLen();
if (checksumLen < 0) {
throw new IOException("Invalid packet: data length in packet header " +
"exceeds data length received. dataPlusChecksumLen=" +
dataPlusChecksumLen + " header: " + curHeader);
}
reslicePacket(headerLen, checksumLen, curHeader.getDataLen());
}
这个函数用来读取相应的数据,读取方式有domain socket和TCP socket两种方式,如果要进行详细了解,请阅读相应的源码,这里就不做详细描述了。