DataXceiverServer接受客户端请求,从而为每个客户端开辟一个DataXceiver线程。
在run()函数中,调用readBlock()读取本地数据并发送给客户端。内部会new一个块发送器,并调用sendBlock()真正发送数据。
具体步骤如下:
一、DataNode为一个Block创建BlockSender对象,会做一些初始化工作
(1) 判断block对应的meta文件是否存在,建立校验文件的输入流checksumIn,读取meta文件的文件头
(2) startoffset:要读取数据的位置;endoffset:初始化为block长度blockLength;length:客户端需要读取的长度
offset = (startOffset - (startOffset % 512));//为了校验的需要,从文件中读取的开始位置。即startOffset落在512的中间位置时,将offset提前
endOffset:如果startOffset + length不能被512整除,则将endOffset设置到startOffset + length偏后能整除512的位置,如果是文件尾,则endOffset为文件尾。
(3) 由于客户端需要校验,datanode返回需要包括校验的部分。meta文件记录了整个Block的校验信息。此时,需要将校验文件中跳过offset/512*4前面的长度,这部分是不需要的
(4) 创建block文件的输入流blockIn,并seek到offset的位置。至此,BlockSender构造完毕。源码如下:
BlockSender构造函数:
- BlockSender(Block block, long startOffset, long length, boolean corruptChecksumOk, boolean chunkOffsetOK, boolean verifyChecksum, DataNode datanode, String clientTraceFmt)
- throws IOException {
- try {
- this.block = block;
- this.chunkOffsetOK = chunkOffsetOK;
- this.corruptChecksumOk = corruptChecksumOk;
- this.verifyChecksum = verifyChecksum;
- this.blockLength = datanode.data.getLength(block);
- this.transferToAllowed = datanode.transferToAllowed;
- this.clientTraceFmt = clientTraceFmt;
- //Block存在对应的校验和文件
- if ( !corruptChecksumOk || datanode.data.metaFileExists(block) ) {
- //创建Block对应的校验和数据读取流
- checksumIn = new DataInputStream(new BufferedInputStream(datanode.data.getMetaDataInputStream(block),BUFFER_SIZE));
- //创建产生Block校验和文件的校验器
- BlockMetadataHeader header = BlockMetadataHeader.readHeader(checksumIn);
- short version = header.getVersion();
- if (version != FSDataset.METADATA_VERSION) {
- LOG.warn("Wrong version (" + version + ") for metadata file for " + block + " ignoring ...");
- }
- checksum = header.getChecksum();
- } else {
- LOG.warn("Could not find metadata file for " + block);
- //创建一个默认的校验器
- checksum = DataChecksum.newDataChecksum(DataChecksum.CHECKSUM_NULL,16 * 1024);
- }
- //调整校验器
- bytesPerChecksum = checksum.getBytesPerChecksum();
- if (bytesPerChecksum > 10*1024*1024 && bytesPerChecksum > blockLength){
- checksum = DataChecksum.newDataChecksum(checksum.getChecksumType(), Math.max((int)blockLength, 10*1024*1024));
- bytesPerChecksum = checksum.getBytesPerChecksum();
- }
- checksumSize = checksum.getChecksumSize();
- if (length < 0) {
- length = blockLength;
- }
- endOffset = blockLength;
- if (startOffset < 0 || startOffset > endOffset || (length + startOffset) > endOffset) {
- String msg = " Offset " + startOffset + " and length " + length + " don't match block " + block + " ( blockLen " + endOffset + " )";
- LOG.warn(datanode.dnRegistration + ":sendBlock() : " + msg);
- throw new IOException(msg);
- }
- //根据校验器调整读取的开始位置和结束位置
- offset = (startOffset - (startOffset % bytesPerChecksum));
- if (length >= 0) {
- // Make sure endOffset points to end of a checksumed chunk.
- long tmpLen = startOffset + length;
- if (tmpLen % bytesPerChecksum != 0) {
- tmpLen += (bytesPerChecksum - tmpLen % bytesPerChecksum);
- }
- if (tmpLen < endOffset) {
- endOffset = tmpLen;
- }
- }
- //根据待读取数据的开始位置定位到校验和的开始位置
- if (offset > 0) {
- long checksumSkip = (offset / bytesPerChecksum) * checksumSize;
- // note blockInStream is seeked when created below
- if (checksumSkip > 0) {
- // Should we use seek() for checksum file as well?
- IOUtils.skipFully(checksumIn, checksumSkip);
- }
- }
- seqno = 0;
- //定位到待去读数据的开始位置
- blockIn = datanode.data.getBlockInputStream(block, offset); // seek to offset
- } catch (IOException ioe) {
- IOUtils.closeStream(this);
- IOUtils.closeStream(blockIn);
- throw ioe;
- }
- }
二、通过blockSender.sendBlock()发送数据
如果此时读取的是整个block文件,则从流中读取状态码。如果成功校验的,则通知block扫描器此块正常,无需再次校验
(1) 发送校验头信息,包括校验类型等
(2) maxChunkPerPacket:64K的buffer向上取512的整除/512。
pktSize = HEADER + (512 + 4)*maxChunkPerPacket,并allocate pktSize长度的pktBuf
(3) 循环通过sendChunks()发送,直到到达endOffset为止
(4) 发送“0”标示读取结束
BlockSender用数据包的方式向接收端发送数据,一个数据包可能包含若干个校验数据块,但它并不需要接收端发送对数据包的确认帧,自己也不接受这些确认帧。一个数据包的格式如下:
packetLen:数据包长度;
offset:数据包中的数据在Block中的开始位置;
seqno:数据包的编号;
endFlag:是否没有数据包标志(0/1);
len:数据包中数据的长度;
chunksum:一个校验和;
datachunk:一个校验数据块;
- //向接收端发送数据
- long sendBlock(DataOutputStream out, OutputStream baseStream, BlockTransferThrottler throttler) throws IOException {
- if( out == null ) {
- throw new IOException( "out stream is null" );
- }
- this.throttler = throttler;
- long initialOffset = offset;
- long totalRead = 0;
- OutputStream streamForSendChunks = out;
- try {
- try {
- checksum.writeHeader(out);//发送校验器信息
- if ( chunkOffsetOK ) {
- out.writeLong( offset );
- }
- out.flush();
- } catch (IOException e) { //socket error
- throw ioeToSocketException(e);
- }
- int maxChunksPerPacket;
- int pktSize = DataNode.PKT_HEADER_LEN + SIZE_OF_INTEGER;
- if (transferToAllowed && !verifyChecksum && baseStream instanceof SocketOutputStream && blockIn instanceof FileInputStream) {
- FileChannel fileChannel = ((FileInputStream)blockIn).getChannel();
- // blockInPosition also indicates sendChunks() uses transferTo.
- blockInPosition = fileChannel.position();
- streamForSendChunks = baseStream;
- //计算一个数据包最多包含多少个数据校验快块
- maxChunksPerPacket = (Math.max(BUFFER_SIZE, MIN_BUFFER_WITH_TRANSFERTO) + bytesPerChecksum - 1)/bytesPerChecksum;
- //计算一个数据包的大小
- pktSize += checksumSize * maxChunksPerPacket;
- } else {
- //计算一个数据包最多包含多少个数据检验块
- maxChunksPerPacket = Math.max(1,(BUFFER_SIZE + bytesPerChecksum - 1)/bytesPerChecksum);
- //计算一个数据包的大小
- pktSize += (bytesPerChecksum + checksumSize) * maxChunksPerPacket;
- }
- ByteBuffer pktBuf = ByteBuffer.allocate(pktSize);
- //一个一个数据包发送数据
- while (endOffset > offset) {
- long len = sendChunks(pktBuf, maxChunksPerPacket, streamForSendChunks);
- offset += len;
- totalRead += len + ((len + bytesPerChecksum - 1)/bytesPerChecksum* checksumSize);
- seqno++;
- }
- try {
- out.writeInt(0); //标记数据已发送完
- out.flush();
- } catch (IOException e) { //socket error
- throw ioeToSocketException(e);
- }
- } finally {
- if (clientTraceFmt != null) {
- ClientTraceLog.info(String.format(clientTraceFmt, totalRead));
- }
- close();
- }
- blockReadFully = (initialOffset == 0 && offset >= blockLength);
- return totalRead;
- }
三、sendChunks():
(1) int len = Math.min((int) (endOffset - offset), bytesPerChecksum*maxChunks); //表示本次要发送的数据的长度
(2) int numChunks = (len + bytesPerChecksum - 1)/bytesPerChecksum;
int packetLen = len + numChunks*checksumSize + 4;
int checksumOff = pkt.position();
int checksumLen = numChunks * checksumSize;
byte[] buf = pkt.array();
checksumIn.readFully(buf, checksumOff, checksumLen); //将校验数据读入buf中
sockOut.transferToFully(((FileInputStream)blockIn).getChannel(), blockInPosition, len); //将数据通过“零拷贝”方式输出
- /*发送一个数据包*/
- private int sendChunks(ByteBuffer pkt, int maxChunks, OutputStream out) throws IOException {
- //计算数据包的长度
- int len = Math.min((int) (endOffset - offset), bytesPerChecksum*maxChunks);
- if (len == 0) {
- return 0;
- }
- //计算这个数据包中应该包含有多少个校验数据块
- int numChunks = (len + bytesPerChecksum - 1)/bytesPerChecksum;
- int packetLen = len + numChunks*checksumSize + 4;
- pkt.clear();
- //数据包头部信息写入缓存
- pkt.putInt(packetLen);
- pkt.putLong(offset);
- pkt.putLong(seqno);
- pkt.put((byte)((offset + len >= endOffset) ? 1 : 0));
- pkt.putInt(len);
- int checksumOff = pkt.position();
- int checksumLen = numChunks * checksumSize;
- byte[] buf = pkt.array();
- //数据对应的校验和信息写入缓存
- if (checksumSize > 0 && checksumIn != null) {
- try {
- checksumIn.readFully(buf, checksumOff, checksumLen);
- } catch (IOException e) {
- LOG.warn(" Could not read or failed to veirfy checksum for data" + " at offset " + offset + " for block " + block + " got : " + StringUtils.stringifyException(e));
- IOUtils.closeStream(checksumIn);
- checksumIn = null;
- if (corruptChecksumOk) {
- if (checksumOff < checksumLen) {
- // Just fill the array with zeros.
- Arrays.fill(buf, checksumOff, checksumLen, (byte) 0);
- }
- } else {
- throw e;
- }
- }
- }
- int dataOff = checksumOff + checksumLen;
- if (blockInPosition < 0) {
- //数据写入缓存
- IOUtils.readFully(blockIn, buf, dataOff, len);
- //对发送的数据验证校验和
- if (verifyChecksum) {
- int dOff = dataOff;
- int cOff = checksumOff;
- int dLeft = len;
- for (int i=0; i<numChunks; i++) {
- checksum.reset();
- int dLen = Math.min(dLeft, bytesPerChecksum);
- checksum.update(buf, dOff, dLen);
- if (!checksum.compare(buf, cOff)) {
- throw new ChecksumException("Checksum failed at " + (offset + len - dLeft), len);
- }
- dLeft -= dLen;
- dOff += dLen;
- cOff += checksumSize;
- }
- }
- //writing is done below (mainly to handle IOException)
- }
- try {
- if (blockInPosition >= 0) {
- //use transferTo(). Checks on out and blockIn are already done.
- SocketOutputStream sockOut = (SocketOutputStream)out;
- //发送缓存的数据包
- sockOut.write(buf, 0, dataOff);
- // no need to flush. since we know out is not a buffered stream.
- sockOut.transferToFully(((FileInputStream)blockIn).getChannel(), blockInPosition, len);
- blockInPosition += len;
- } else {
- //发送缓存的数据包
- out.write(buf, 0, dataOff + len);
- }
- } catch (IOException e) {
- /* exception while writing to the client (well, with transferTo(),
- * it could also be while reading from the local file).
- */
- throw ioeToSocketException(e);
- }
- if (throttler != null) { //调整发送速度
- throttler.throttle(packetLen);
- }
- return len;
- }
注:由于读写同一个块的存在,写入数据文件和校验文件有先后,读取时可能会校验失败。此时判断是否block有变化,是则读取数据文件并计算校验值,更新buf中的校验信息,发送到客户端,保证正确。
客户端会校验数据的正确性,并发送OP_STATUS_CHECKSUM_OK给datanode,读取成功!