HDFS源码-写数据流程

HDFS源码-写数据流程

hadoop版本:2.7.0

前言

这里主要分析HDFS的写数据流程,包括客户端、输出流包装类对象、chunk、packet、pipeline、dataqueue、ackqueue等概念内容比较多,慢慢一点点的分析。这里我以copyToLocalFile()方法为例来通过源码整体分析HDFS的写数据流程。

重点解析

  • DFSOutputStream介绍
    客户端应用程序写的数据被该字节流缓存起来了,数据被拆分成若干个packets,每个packet 大小为64kb。每个packet由若干个chunk组成。每个chunk大小为512字节,每个chunk都有一个cheksum。当客户端写满packet后,此packet被加入dataQueue队列。DataStreamer 线程从dataQueue中取出packet,发送给第一个管道中的第一个DataNode,然后将此packet从dataQueue移动到ackQueue。ResponseProcessor 用于接收datanode发送的ack响应,当收到所有dataNode的asck 后,那么次packet将从ackQueue中移除,至此,该packet就算是写成功了。 在出错的情况下,ackQueue中所有的packet都会转移到dataQueue中, 在ackQueue中的packet都是尚未写成功的数据。会建立新的pipeline,去除有问题的DataNode节点,重新发送dataQueue中的packet至pipeline中。
    /****************************************************************
     * DFSOutputStream 是一个用来创建文件的字节流
     * 
     * DFSOutputStream creates files from a stream of bytes.
     * 
     *  客户端应用程序写的数据被该字节流缓存起来了,数据被拆分成若干个packets,每个packet 
     *  大小为64kb。每个packet由若干个chunk组成。每个chunk大小为512字节,每个chunk都有一个cheksum
     * 
     * The client application writes data that is cached internally by
     * this stream. Data is broken up into packets, each packet is
     * typically 64K in size. A packet comprises of chunks. Each chunk
     * is typically 512 bytes and has an associated checksum with it.
     * 
     * 当客户端写满packet后,此packet被加入dataQueue队列。DataStreamer 线程从dataQueue
     * 中取出packet,发送给第一个管道中的第一个DataNode,然后将此packet从dataQueue移动到
     * ackQueue。ResponseProcessor 用于接收datanode发送的ack响应,当收到所有dataNode的asck
     * 后,那么次packet将从ackQueue中移除,至此,该packet就算是写成功了。
     * When a client application fills up the currentPacket, it is
     * enqueued into dataQueue.  The DataStreamer thread picks up
     * packets from the dataQueue, sends it to the first datanode in
     * the pipeline and moves it from the dataQueue to the ackQueue.
     * The ResponseProcessor receives acks from the datanodes. When an
     * successful ack for a packet is received from all datanodes, the
     * ResponseProcessor removes the corresponding packet from the
     * ackQueue.
     * 
     * 在出错的情况下,ackQueue中所有的packet都会转移到dataQueue中,
     * 在ackQueue中的packet都是尚未写成功的数据。会建立新的pipeline,去除
     * 有问题的DataNode节点,重新发送dataQueue中的packet至pipeline中。
     * 
     * In case of error, all outstanding packets and moved from
     * ackQueue. A new pipeline is setup by eliminating the bad
     * datanode from the original pipeline. The DataStreamer now
     * starts sending packets from the dataQueue.
    ****************************************************************/
    
  • DFSPacket & chunk
    在这里插入图片描述

一、整体写流程

1. 建立管道
在这里插入图片描述
1.1 客户端源头代码

FileSystem fileSystem = FileSystem.get(new Configuration());
fileSystem.copyFromLocalFile();

1.2 这里是通过递归的方式进行文件的复制,再次之前需要向NameNode进行文件校验,而后进行IO操作写数据。这里的流都是装饰者模式,主要的输出流对象就是FSDataOutputStream

/** Copy files between FileSystems. */
  public static boolean copy(FileSystem srcFS, FileStatus srcStatus,
                             FileSystem dstFS, Path dst,
                             boolean deleteSource,
                             boolean overwrite,
                             Configuration conf) throws IOException {
    Path src = srcStatus.getPath();
    //校验文件是否存在
    dst = checkDest(src.getName(), dstFS, dst, overwrite);
    //文件夹,这里需要递归创建目录,通过NameNode
    if (srcStatus.isDirectory()) {
      //文件夹的层级依赖 不能复制文件到本身 不能复制文件夹到子目录
      checkDependencies(srcFS, src, dstFS, dst);
      //创建文件夹
      //TODO 下回分析NameNode的EditLog的原理
      if (!dstFS.mkdirs(dst)) {
        return false;
      }
      FileStatus contents[] = srcFS.listStatus(src);
      //如果是文件,递归调用copy方法
      for (int i = 0; i < contents.length; i++) {
        //递归复制每一个文件
        copy(srcFS, contents[i], dstFS,
             new Path(dst, contents[i].getPath().getName()),
             deleteSource, overwrite, conf);
      }
    } else {
      //文件写操作
      InputStream in=null;
      OutputStream out = null;
      try {
        in = srcFS.open(src);
        //DFSOutputStream
        //创建DFSOutputStream
        out = dstFS.create(dst, overwrite);
        IOUtils.copyBytes(in, out, conf, true);
      } catch (IOException e) {
        IOUtils.closeStream(out);
        IOUtils.closeStream(in);
        throw e;
      }
    }
    if (deleteSource) {
      return srcFS.delete(src, true);
    } else {
      return true;
    }
  
  }

1.3 创建FSDataOutputStream对象,由DistributedFileSystem创建。

@Override
  public FSDataOutputStream create(final Path f, final FsPermission permission,
    final EnumSet<CreateFlag> cflags, final int bufferSize,
    final short replication, final long blockSize, final Progressable progress,
    final ChecksumOpt checksumOpt) throws IOException {
    statistics.incrementWriteOps(1);
    Path absF = fixRelativePart(f);
    return new FileSystemLinkResolver<FSDataOutputStream>() {
      @Override
      public FSDataOutputStream doCall(final Path p)
          throws IOException, UnresolvedLinkException {
        final DFSOutputStream dfsos = dfs.create(getPathName(p), permission,
                cflags, replication, blockSize, progress, bufferSize,
                checksumOpt);
        return dfs.createWrappedOutputStream(dfsos, statistics);
      }
      @Override
      public FSDataOutputStream next(final FileSystem fs, final Path p)
          throws IOException {
        return fs.create(p, permission, cflags, bufferSize,
            replication, blockSize, progress, checksumOpt);
      }
    }.resolve(this, absF);
  }

1.4管道建立
①客户端 DFSOutputStream#createBlockOutputStream()方法

			assert null == s : "Previous socket unclosed";
          assert null == blockReplyStream : "Previous blockReplyStream unclosed";
          //和DN建立了管道
          s = createSocketForPipeline(nodes[0], nodes.length, dfsClient);
          long writeTimeout = dfsClient.getDatanodeWriteTimeout(nodes.length);
          
          OutputStream unbufOut = NetUtils.getOutputStream(s, writeTimeout);
          InputStream unbufIn = NetUtils.getInputStream(s);
          IOStreamPair saslStreams = dfsClient.saslClient.socketSend(s,
            unbufOut, unbufIn, dfsClient, accessToken, nodes[0]);
          unbufOut = saslStreams.out;
          unbufIn = saslStreams.in;
          out = new DataOutputStream(new BufferedOutputStream(unbufOut,
              HdfsConstants.SMALL_BUFFER_SIZE));
          blockReplyStream = new DataInputStream(unbufIn);
  
          //
          // Xmit header info to datanode
          //

          BlockConstructionStage bcs = recoveryFlag? stage.getRecoveryStage(): stage;

          // We cannot change the block length in 'block' as it counts the number
          // of bytes ack'ed.
          ExtendedBlock blockCopy = new ExtendedBlock(block);
          blockCopy.setNumBytes(blockSize);

          boolean[] targetPinnings = getPinnings(nodes, true);
          // send the request
          //发送待写入的数据块信息
          new Sender(out).writeBlock(blockCopy, nodeStorageTypes[0], accessToken,
              dfsClient.clientName, nodes, nodeStorageTypes, null, bcs,
              nodes.length, block.getNumBytes(), bytesSent, newGS,
              checksum4WriteBlock, cachingStrategy.get(), isLazyPersistFile,
            (targetPinnings == null ? false : targetPinnings[0]), targetPinnings);

②DataNode端 DataXceiver#run()方法

  1. readOp(),这里的OP占用2字节,short类型
  2. processOp()
  3. writeBlock()
  4. Sender#writeBlock()

2.写数据流程
在这里插入图片描述

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值