Hadoop核心源码剖析系列（二）

最新推荐文章于 2022-01-25 17:43:33 发布

数据与智能

最新推荐文章于 2022-01-25 17:43:33 发布

阅读量170

点赞数

文章标签： java 大数据 hadoop 设计模式 python

点击上方“数据与智能”，“星标或置顶公众号”

第一时间获取好内容

作者 | 吴邪大数据4年从业经验，目前就职于广州一家互联网公司，负责大数据基础平台自研、离线计算&实时计算研究

编辑 | auroral-L

在上一篇文章《Hadoop核心源码剖析系列（一）》中，我们给大家介绍了Hadoop底层通信原理RPC的模型，了解了Hadoop RPC服务之间的通信原理。主要分享了自己工作以来学会的阅读剖析技术源码的方法——场景驱动法，希望能解决小伙伴们阅读源码难、无从下手的问题，并且为大家剖析了NameNode初始化的核心流程，大家有兴趣可以点击阅读，如果想深入了解的话需要自己动手去查看代码细节。

本篇文章为大家分享DataNode初始化、注册和心跳机制，DataNode相对NameNode更加复杂一些，剖析难度较大，阅读源码的时候养成好习惯，我们在阅读核心类的时候首先还是要先看类注释，理解剖析类扮演的角色和功能，为了避免文章篇幅过长，我们会提取核心部分代码进行剖析，由于文章不能像idea开发工具去点击方法，所以会把引用到的核心方法单独抽取，层次递进，感兴趣的可以回头自己深入研究。

DataNode剖析

 /**********************************************************
 * DataNode is a class (and program) that stores a set of
 * blocks for a DFS deployment.  A single deployment can
 * have one or many DataNodes.  Each DataNode communicates
 * regularly with a single NameNode.  It also communicates
 * with client code and other DataNodes from time to time.
 * 
 * 
 * DataNodes store a series of named blocks.  The DataNode
 * allows client code to read these blocks, or to write new
 * block data.  The DataNode may also, in response to instructions
 * from its NameNode, delete blocks or copy blocks to/from other
 * DataNodes.
 * 
 *
 * The DataNode maintains just one critical table:
 *   block-> stream of bytes (of BLOCK_SIZE or less)
 * This info is stored on a local disk.  The DataNode
 * reports the table's contents to the NameNode upon startup
 * and every so often afterwards.
 *
 * DataNodes spend their lives in an endless loop of asking
 * the NameNode for something to do.  A NameNode cannot connect
 * to a DataNode directly; a NameNode simply returns values from
 * functions invoked by a DataNode.
 *
 *
 * DataNodes maintain an open server socket so that client code 
 * or other DataNodes can read/write data.  The host/port for
 * this server is reported to the NameNode, which then sends that
 * information to clients or other DataNodes that might be interested.
 *
 * 总结：
 * 1）一个集群里面可以有很多个DataNode，这些DataNode就是用来存储数据的。
 * 2）DataNode启动了以后会周期性的跟NameNode进行通信（心跳，块汇报）,执行NameNode发送过来的指令，比如删除block、创建block等等
 * 3）NameNode不能直接操作DataNode.而是通过心跳返回值指令的方式去操作DataNode的.
 * 4) DataNode启动了以后开放了一个socket的服务（RPC）,等待别人去调用它。
 *
 **********************************************************/

DataNode主线程main(··· )

··· 
public static void main(String args[]) {
  //解析参数，参数异常则退出线程
  //USAGE: java DataNode [-regular | -rollback]
  if (DFSUtil.parseHelpArgument(args, DataNode.USAGE, System.out, true)) {
    System.exit(0);
  }
  //核心代码
  secureMain(args, null);
}

可以看到DataNode的 _main(...) 方法只有短短的几行代码，这样我们的目标就很明确，核心方法就是secureMain(...)

public static void secureMain(String args[], SecureResources resources) {
  int errorCode = 0;
  try {
    StringUtils.startupShutdownMessage(DataNode.class, args, LOG);
    //初始化DataNode,看到这里是不是很熟悉，NameNode中也有这个方法
    DataNode datanode = createDataNode(args, null, resources);
    //等待datanode初始化完成再继续往下执行
    if (datanode != null) {
     //线程阻塞
      datanode.join();
    } else {
      errorCode = 1;
    }
  } catch (Throwable e) {
    LOG.fatal("Exception in secureMain", e);
    terminate(1, e);
  } finally {
    // We need to terminate the process here because either shutdown was called
    // or some disk related conditions like volumes tolerated or volumes required
    // condition was not met. Also, In secure mode, control will go to Jsvc
    // and Datanode process hangs if it does not exit.
    LOG.warn("Exiting Datanode");
    terminate(errorCode);
  }
}

上面的代码很明显，执行完 createDataNode(...) 方法之后就完成了初始化流程，这里还巧妙地运用了线程阻塞的设计，等待datanode完成初始化，我们在开发中也可以借鉴一下。

/** Instantiate & Start a single datanode daemon and wait for it to finish.
 *  If this thread is specifically interrupted, it will stop waiting.
 */
@VisibleForTesting
@InterfaceAudience.Private
public static DataNode createDataNode(String args[], Configuration conf,
    SecureResources resources) throws IOException {
  //初始化DataNode，返回DataNode对象
  DataNode dn = instantiateDataNode(args, conf, resources);
  //DataNode初始化完成之后，以守护进程的方式运行
  if (dn != null) {
    //启动DataNode后台线程
    dn.runDatanodeDaemon();
  }
  return dn;
}

初始化

/** Instantiate a single datanode object, along with its secure resources. 
 * This must be run by invoking{@link DataNode#runDatanodeDaemon()} 
 * subsequently. 
 */
 //初始化datanode，最终要调用runDataNodeDaemon()方法以守护进程的方式后台运行
public static DataNode instantiateDataNode(String args [], Configuration conf,
    SecureResources resources) throws IOException {
  if (conf == null)
    conf = new HdfsConfiguration();
  
  if (args != null) {
    // parse generic hadoop options
    GenericOptionsParser hParser = new GenericOptionsParser(conf, args);
    args = hParser.getRemainingArgs();
  }
  
  if (!parseArguments(args, conf)) {
    printUsage(System.err);
    return null;
  }
  Collection<StorageLocation> dataLocations = getStorageLocations(conf);
  UserGroupInformation.setConfiguration(conf);
  SecurityUtil.login(conf, DFS_DATANODE_KEYTAB_FILE_KEY,
      DFS_DATANODE_KERBEROS_PRINCIPAL_KEY);
  //重要的代码,上面都在解析参数
  return makeInstance(dataLocations, conf, resources);
}

/**
 * Make an instance of DataNode after ensuring that at least one of the
 * given data directories (and their parent directories, if necessary)
 * can be created.
 * @param dataDirs List of directories, where the new DataNode instance should
 * keep its files.
 * @param conf Configuration instance to use.
 * @param resources Secure resources needed to run under Kerberos
 * @return DataNode instance for given list of data dirs and conf, or null if
 * no directory from this directory list can be created.
 * @throws IOException
 */
static DataNode makeInstance(Collection<StorageLocation> dataDirs,
    Configuration conf, SecureResources resources) throws IOException {
  LocalFileSystem localFS = FileSystem.getLocal(conf);
  FsPermission permission = new FsPermission(
      conf.get(DFS_DATANODE_DATA_DIR_PERMISSION_KEY,
               DFS_DATANODE_DATA_DIR_PERMISSION_DEFAULT));
  DataNodeDiskChecker dataNodeDiskChecker =
      new DataNodeDiskChecker(permission);
  List<StorageLocation> locations =
      checkStorageLocations(dataDirs, localFS, dataNodeDiskChecker);
  DefaultMetricsSystem.initialize("DataNode");
  assert locations.size() > 0 : "number of data directories should be > 0";
  //重要代码,跟NameNode是一样的套路
  return new DataNode(conf, locations, resources);
}

/**
 * Create the DataNode given a configuration, an array of dataDirs,
 * and a namenode proxy
 */
 //解析DataNode配置hdfs-site.xml和core-site.xml,启动DataNode
DataNode(final Configuration conf,
         final List<StorageLocation> dataDirs,
         final SecureResources resources) throws IOException {
  super(conf);
  this.blockScanner = new BlockScanner(this, conf);
  this.lastDiskErrorCheck = 0;
  this.maxNumberOfBlocksToLog = conf.getLong(DFS_MAX_NUM_BLOCKS_TO_LOG_KEY,
      DFS_MAX_NUM_BLOCKS_TO_LOG_DEFAULT);
  this.usersWithLocalPathAccess = Arrays.asList(
      conf.getTrimmedStrings(DFSConfigKeys.DFS_BLOCK_LOCAL_PATH_ACCESS_USER_KEY));
  this.connectToDnViaHostname = conf.getBoolean(
      DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME,
      DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME_DEFAULT);
  this.getHdfsBlockLocationsEnabled = conf.getBoolean(
      DFSConfigKeys.DFS_HDFS_BLOCKS_METADATA_ENABLED, 
      DFSConfigKeys.DFS_HDFS_BLOCKS_METADATA_ENABLED_DEFAULT);
  this.supergroup = conf.get(DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_KEY,
      DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_DEFAULT);
  this.isPermissionEnabled = conf.getBoolean(
      DFSConfigKeys.DFS_PERMISSIONS_ENABLED_KEY,
      DFSConfigKeys.DFS_PERMISSIONS_ENABLED_DEFAULT);
  this.pipelineSupportECN = conf.getBoolean(
      DFSConfigKeys.DFS_PIPELINE_ECN_ENABLED,
      DFSConfigKeys.DFS_PIPELINE_ECN_ENABLED_DEFAULT);
  confVersion = "core-" +
      conf.get("hadoop.common.configuration.version", "UNSPECIFIED") +
      ",hdfs-" +
      conf.get("hadoop.hdfs.configuration.version", "UNSPECIFIED");
  // Determine whether we should try to pass file descriptors to clients.
  if (conf.getBoolean(DFSConfigKeys.DFS_CLIENT_READ_SHORTCIRCUIT_KEY,
            DFSConfigKeys.DFS_CLIENT_READ_SHORTCIRCUIT_DEFAULT)) {
    String reason = DomainSocket.getLoadingFailureReason();
    if (reason != null) {
      LOG.warn("File descriptor passing is disabled because " + reason);
      this.fileDescriptorPassingDisabledReason = reason;
    } else {
      LOG.info("File descriptor passing is enabled.");
      this.fileDescriptorPassingDisabledReason = null;
    }
  } else {
    this.fileDescriptorPassingDisabledReason =
        "File descriptor passing was not configured.";
    LOG.debug(this.fileDescriptorPassingDisabledReason);
  }
  try {
    hostName = getHostName(conf);
    LOG.info("Configured hostname is " + hostName);
    //启动datanode
    startDataNode(conf, dataDirs, resources);
  } catch (IOException ie) {
    shutdown();
    throw ie;
  }
  final int dncCacheMaxSize =
      conf.getInt(DFS_DATANODE_NETWORK_COUNTS_CACHE_MAX_SIZE_KEY,
          DFS_DATANODE_NETWORK_COUNTS_CACHE_MAX_SIZE_DEFAULT) ;
  //二十三种设计模式之构建者设计模式
  datanodeNetworkCounts =
      CacheBuilder.newBuilder()
          .maximumSize(dncCacheMaxSize)
          .build(new CacheLoader<String, Map<String, Long>>() {
            @Override
            public Map<String, Long> load(String key) throws Exception {
              final Map<String, Long> ret = new HashMap<String, Long>();
              ret.put("networkErrors", 0L);
              return ret;
            }
          });
}

启动DataNode

/**
 * This method starts the data node with the specified conf.
 * 
 * @param conf - the configuration
 *  if conf's CONFIG_PROPERTY_SIMULATED property is set
 *  then a simulated storage based data node is created.
 * 
 * @param dataDirs - only for a non-simulated storage data node
 * @throws IOException
 */
void startDataNode(Configuration conf, 
                   List<StorageLocation> dataDirs,
                   SecureResources resources
                   ) throws IOException {
  // settings global for all BPs in the Data Node
  this.secureResources = resources;
  synchronized (this) {
    this.dataDirs = dataDirs;
  }
  this.conf = conf;
  this.dnConf = new DNConf(conf);
  checkSecureConfig(dnConf, conf, resources);
  this.spanReceiverHost = SpanReceiverHost.getInstance(conf);
  if (dnConf.maxLockedMemory > 0) {
    if (!NativeIO.POSIX.getCacheManipulator().verifyCanMlock()) {
      throw new RuntimeException(String.format(
          "Cannot start datanode because the configured max locked memory" +
          " size (%s) is greater than zero and native code is not available.",
          DFS_DATANODE_MAX_LOCKED_MEMORY_KEY));
    }
    if (Path.WINDOWS) {
      NativeIO.Windows.extendWorkingSetSize(dnConf.maxLockedMemory);
    } else {
      long ulimit = NativeIO.POSIX.getCacheManipulator().getMemlockLimit();
      if (dnConf.maxLockedMemory > ulimit) {
        throw new RuntimeException(String.format(
          "Cannot start datanode because the configured max locked memory" +
          " size (%s) of %d bytes is more than the datanode's available" +
          " RLIMIT_MEMLOCK ulimit of %d bytes.",
          DFS_DATANODE_MAX_LOCKED_MEMORY_KEY,
          dnConf.maxLockedMemory,
          ulimit));
      }
    }
  }
  LOG.info("Starting DataNode with maxLockedMemory = " +
      dnConf.maxLockedMemory);
  storage = new DataStorage();
  
  // global DN settings
  registerMXBean();
  //点进去看的话会发现里面主要做了如下操作
  //1、初始化TcpPeerServer用于接收tcp请求
  //2、实例化DataXceiverServer，用于接收客户端已经其他DataNode节点之间的数据服务，并设置成守护进程的方式在后台运行
  initDataXceiver(conf);
  
  //与NameNode一样，这里启动HttpServer（httpserver2）服务
  //用于接收http请求，点进去会发现方法里面也用到了构建者设计模式
  //初始化了HttpServer2并且绑定了很多servlet       infoServer.addInternalServlet（）
  //启动了http服务 this.infoServer.start();
  startInfoServer(conf);
  
  pauseMonitor = new JvmPauseMonitor(conf);
  pauseMonitor.start();
  // BlockPoolTokenSecretManager is required to create ipc server.
  this.blockPoolTokenSecretManager = new BlockPoolTokenSecretManager();
  // Login is done by now. Set the DN user name.
  dnUserName = UserGroupInformation.getCurrentUser().getShortUserName();
  LOG.info("dnUserName = " + dnUserName);
  LOG.info("supergroup = " + supergroup);
  
  //初始化RPC的服务，添加了很多Protocol协议给DataNode去implement，处理客户端和DataNode数据请求
  /**
  *又见构建者设计模式
  *ipcServer = new RPC.Builder(conf)
  *  .setProtocol(ClientDatanodeProtocolPB.class)
  *  .setInstance(service)
  *  .setBindAddress(ipcAddr.getHostName())
  *  .setPort(ipcAddr.getPort())
  *  .setNumHandlers(
  *      conf.getInt(DFS_DATANODE_HANDLER_COUNT_KEY,
  *         DFS_DATANODE_HANDLER_COUNT_DEFAULT)).setVerbose(false)
  *  .setSecretManager(blockPoolTokenSecretManager).build();
  */
  initIpcServer(conf);
  
  metrics = DataNodeMetrics.create(conf, getDisplayName());
  metrics.getJvmMetrics().setPauseMonitor(pauseMonitor);
  
  //创建了BlockPoolManager
  //BlockPool，一个集群就有一个BlockPool
  blockPoolManager = new BlockPoolManager(this);
  
  //周期性与NameNode通信，保持心跳，汇报情况
  blockPoolManager.refreshNamenodes(conf);
  // Create the ReadaheadPool from the DataNode context so we can
  // exit without having to explicitly shutdown its thread pool.
  readaheadPool = ReadaheadPool.getInstance();
  saslClient = new SaslDataTransferClient(dnConf.conf, 
      dnConf.saslPropsResolver, dnConf.trustedChannelResolver);
  saslServer = new SaslDataTransferServer(dnConf, blockPoolTokenSecretManager);
}

注册和心跳机制

作为一名IT开发工作者，我们对代码的敏感度是很重要的，通过大量阅读优秀代码以及自身平时开发的代码可以提高我们对代码的敏感度，有助于我们辨别哪些是重要方法，哪些是无关紧要的。有很好的代码敏感度在阅读优秀代码时能为我们省很多不必要的精力。

void refreshNamenodes(Configuration conf)
    throws IOException {
  LOG.info("Refresh request received for nameservices: " + conf.get
          (DFSConfigKeys.DFS_NAMESERVICES));
  Map<String, Map<String, InetSocketAddress>> newAddressMap = DFSUtil
          .getNNServiceRpcAddressesForCluster(conf);
  synchronized (refreshNamenodesLock) {
   //重要代码
    doRefreshNamenodes(newAddressMap);
  }
}

/**
 * @param addrMap
 * @throws IOException
 */
private void doRefreshNamenodes(
    Map<String, Map<String, InetSocketAddress>> addrMap) throws IOException {
  assert Thread.holdsLock(refreshNamenodesLock);
  Set<String> toRefresh = Sets.newLinkedHashSet();
  Set<String> toAdd = Sets.newLinkedHashSet();
  Set<String> toRemove;
  
  synchronized (this) {
    // Step 1. For each of the new nameservices, figure out whether
    // it's an update of the set of NNs for an existing NS,
    // or an entirely new nameservice.
 
    //循环遍历addrMap，一般情况下只有一个nameservice，如果部署架构是HA模式或者federation模式则会存在多个nameservice   
    for (String nameserviceId : addrMap.keySet()) {
      if (bpByNameserviceId.containsKey(nameserviceId)) {
        toRefresh.add(nameserviceId);
      } else {
        //TODO toAdd里面有多少有的联邦，一个联邦就是一个NameService
        toAdd.add(nameserviceId);
      }
    }
    
    // Step 2. Any nameservices we currently have but are no longer present
    // need to be removed.
    toRemove = Sets.newHashSet(Sets.difference(
        bpByNameserviceId.keySet(), addrMap.keySet()));
    
    assert toRefresh.size() + toAdd.size() ==
      addrMap.size() :
        "toAdd: " + Joiner.on(",").useForNull("<default>").join(toAdd) +
        "  toRemove: " + Joiner.on(",").useForNull("<default>").join(toRemove) +
        "  toRefresh: " + Joiner.on(",").useForNull("<default>").join(toRefresh);
    
    // Step 3. Start new nameservices
    if (!toAdd.isEmpty()) {
      LOG.info("Starting BPOfferServices for nameservices: " +
          Joiner.on(",").useForNull("<default>").join(toAdd));
      //遍历所有的联邦集群，一个联邦里面会有两个NameNode(HA)
      //如果是2个联邦集群，那么这个地方就会有两个值
      //BPOfferService 对应 一个联邦集群
      for (String nsToAdd : toAdd) {
        ArrayList<InetSocketAddress> addrs =
        //如果里面做两个高可用，hdoop1,hadoop2
        Lists.newArrayList(addrMap.get(nsToAdd).values());
        
        //一个联邦对应一个BPOfferService
        //一个联邦里面的一个NameNode就是一个BPServiceActor
        //也就是正常来说一个BPOfferService对应两个BPServiceActor
        //hdfs-site.xml core-site.xml
        BPOfferService bpos = createBPOS(addrs);
        bpByNameserviceId.put(nsToAdd, bpos);
        offerServices.add(bpos);
      }
    }
    //DataNode向NameNode进行注册和心跳
    startAll();
  }
  // Step 4. Shut down old nameservices. This happens outside
  // of the synchronized(this) lock since they need to call
  // back to .remove() from another thread
  if (!toRemove.isEmpty()) {
    LOG.info("Stopping BPOfferServices for nameservices: " +
        Joiner.on(",").useForNull("<default>").join(toRemove));
    
    for (String nsToRemove : toRemove) {
      BPOfferService bpos = bpByNameserviceId.get(nsToRemove);
      bpos.stop();
      bpos.join();
      // they will call remove on their own
    }
  }
  
  // Step 5. Update nameservices whose NN list has changed
  if (!toRefresh.isEmpty()) {
    LOG.info("Refreshing list of NNs for nameservices: " +
        Joiner.on(",").useForNull("<default>").join(toRefresh));
    
    for (String nsToRefresh : toRefresh) {
      BPOfferService bpos = bpByNameserviceId.get(nsToRefresh);
      ArrayList<InetSocketAddress> addrs =
        Lists.newArrayList(addrMap.get(nsToRefresh).values());
      bpos.refreshNNList(addrs);
    }
  }
}

//采用了线程安全
synchronized void startAll() throws IOException {
  try {
    UserGroupInformation.getLoginUser().doAs(
        new PrivilegedExceptionAction<Object>() {
          @Override
          public Object run() throws Exception {
           //遍历所有的BPOfferService
            for (BPOfferService bpos : offerServices) {
              //重要
              bpos.start();
            }
            return null;
          }
        });
  } catch (InterruptedException ex) {
    IOException ioe = new IOException();
    ioe.initCause(ex.getCause());
    throw ioe;
  }
}
//这里的start() 方法是指BPOfferService调用的方法，区别于下面BPServiceActor调用的start()
//This must be called only by blockPoolManager
void start() {
 //一个bpOfferService里面就会有多个Actor
  for (BPServiceActor actor : bpServices) {
   //DataNode进行注册和心跳
    actor.start();
  }
}


//这里的start() 方法是指BPServiceActor调用的线程方法
//This must be called only by BPOfferService
void start() {
  if ((bpThread != null) && (bpThread.isAlive())) {
    //Thread is started already
    return;
  }
  bpThread = new Thread(this, formatThreadName());
  //run
  bpThread.setDaemon(true); // needed for JUnit testing
  //启动线程，所以我们要找到run() 方法
  bpThread.start();
}

/**
 * No matter what kind of exception we get, keep retrying to offerService().
 * That's the loop that connects to the NameNode and provides basic DataNode
 * functionality.
 *
 * Only stop when "shouldRun" or "shouldServiceRun" is turned off, which can
 * happen either at shutdown or due to refreshNamenodes.
 */
@Override
public void run() {
  LOG.info(this + " starting to offer service");
  
  //注册+心跳
  
  try {
  //这里的设计很巧妙，直接用来while死循环，为了保证DataNode能注册成功，如果注册过程中发生异常，则捕获异常，沉睡5秒钟并重试，注册完成则break跳出循环继续往下执行
    while (true) {
      // init stuff
      try {
        //注册核心代码
        connectToNNAndHandshake();
        break;
      } catch (IOException ioe) {
        // Initial handshake, storage recovery or registration failed
        runningState = RunningState.INIT_FAILED;
        if (shouldRetryInit()) {
          // Retry until all namenode's of BPOS failed initialization
          LOG.error("Initialization failed for " + this + " "
              + ioe.getLocalizedMessage());
          //如果有问题sleep 5秒
          sleepAndLogInterrupts(5000, "initializing");
        } else {
          runningState = RunningState.FAILED;
          LOG.fatal("Initialization failed for " + this + ". Exiting. ", ioe);
          return;
        }
      }
    }
    //注册结束了
    runningState = RunningState.RUNNING;
    while (shouldRun()) {
      try {
       //发送心跳
        offerService();
      } catch (Exception ex) {
        LOG.error("Exception in BPOfferService for " + this, ex);
        sleepAndLogInterrupts(5000, "offering service");
      }
    }
    runningState = RunningState.EXITED;
  } catch (Throwable ex) {
    LOG.warn("Unexpected exception in block pool " + this, ex);
    runningState = RunningState.FAILED;
  } finally {
    LOG.warn("Ending block pool service for: " + this);
    cleanUp();
  }
}

接下来我们把重点放在connectToNNAndHandshake(...) 和offerService(...) 这两个方法上

private void connectToNNAndHandshake() throws IOException {
  //获取到namenode的代理
  //RPC的客户端
  //datanode(  获取到了代理 ->(hostname ,port) namenode
  bpNamenode = dn.connectToNN(nnAddr);
  // First phase of the handshake with NN - get the namespace
  // info.
  //开始尝试与NameNode请求通信获取NameNode信息，
  NamespaceInfo nsInfo = retrieveNamespaceInfo();
  
  // Verify that this matches the other NN in this HA pair.
  // This also initializes our block pool in the DN if we are
  // the first NN connection for this BP.
  //校验NamespaceInfo的信息。
  // datanode  -> HA()
  bpos.verifyAndSetNamespaceInfo(nsInfo);
  
  // Second phase of the handshake with the NN.
  //注册
  register(nsInfo);
}


/**
 * Register one bp with the corresponding NameNode
 * <p>
 * The bpDatanode needs to register with the namenode on startup in order
 * 1) to report which storage it is serving now and 
 * 2) to receive a registrationID
 *  
 * issued by the namenode to recognize registered datanodes.
 * 
 * @param nsInfo current NamespaceInfo
 * @see FSNamesystem#registerDatanode(DatanodeRegistration)
 * @throws IOException
 */
void register(NamespaceInfo nsInfo) throws IOException {
  // The handshake() phase loaded the block pool storage
  // off disk - so update the bpRegistration object from that info
 
//创建注册信息
  bpRegistration = bpos.createRegistration();
  LOG.info(this + " beginning handshake with NN");
  while (shouldRun()) {
    try {
      // Use returned registration from namenode with updated fields
      //调用NameNodeRPC服务端的registerDatanode方法
      
      bpRegistration = bpNamenode.registerDatanode(bpRegistration);
      //如果执行到这儿，说明注册过程已经完成了。
      bpRegistration.setNamespaceInfo(nsInfo);
      break;
    } catch(EOFException e) {  // namenode might have just restarted
      LOG.info("Problem connecting to server: " + nnAddr + " :"
          + e.getLocalizedMessage());
      sleepAndLogInterrupts(1000, "connecting to server");
    } catch(SocketTimeoutException e) {  // namenode is busy
      LOG.info("Problem connecting to server: " + nnAddr);
      sleepAndLogInterrupts(1000, "connecting to server");
    }
  }
  
  LOG.info("Block pool " + this + " successfully registered with NN");
  bpos.registrationSucceeded(this, bpRegistration);
  // random short delay - helps scatter the BR from all DNs
  scheduleBlockReport(dnConf.initialBlockReportDelay);
}

/**
 * Main loop for each BP thread. Run until shutdown,
 * forever calling remote NameNode functions.
 */
private void offerService() throws Exception {
  LOG.info("For namenode " + nnAddr + " using"
      + " DELETEREPORT_INTERVAL of " + dnConf.deleteReportInterval + " msec "
      + " BLOCKREPORT_INTERVAL of " + dnConf.blockReportInterval + "msec"
      + " CACHEREPORT_INTERVAL of " + dnConf.cacheReportInterval + "msec"
      + " Initial delay: " + dnConf.initialBlockReportDelay + "msec"
      + "; heartBeatInterval=" + dnConf.heartBeatInterval);
  //
  // Now loop for a long time....
  //周期性心跳
  while (shouldRun()) {
    try {
      final long startTime = monotonicNow();
      //
      // Every so often, send heartbeat or block-report
      //心跳是每3秒进行一次
      /**
       *heartBeatInterval = conf.getLong(DFS_HEARTBEAT_INTERVAL_KEY,
       *DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000L;
       */
      if (startTime - lastHeartbeat >= dnConf.heartBeatInterval) {
        //
        // All heartbeat messages include following info:
        // -- Datanode name
        // -- data transfer port
        // -- Total capacity
        // -- Bytes remaining
        //
        lastHeartbeat = startTime;
        if (!dn.areHeartbeatsDisabledForTests()) {
         //NameNode是不直接跟DataNode进行连接的。
         //DataNode发送心跳给NameNode
         //NameNode接收到心跳以后，会返回来一些指令
         //DataNode接收到这些指令以后，根据这些指令做对应的操作。
          
          //发送心跳，返回来的是NameNode给的响应指令
          HeartbeatResponse resp = sendHeartBeat();
          assert resp != null;
          dn.getMetrics().addHeartbeat(monotonicNow() - startTime);
          // If the state of this NN has changed (eg STANDBY->ACTIVE)
          // then let the BPOfferService update itself.
          //
          // Important that this happens before processCommand below,
          // since the first heartbeat to a new active might have commands
          // that we should actually process.
          bpos.updateActorStatesFromHeartbeat(
              this, resp.getNameNodeHaState());
          state = resp.getNameNodeHaState().getState();
          if (state == HAServiceState.ACTIVE) {
            handleRollingUpgradeStatus(resp);
          }
          long startProcessCommands = monotonicNow();
          //获取到一些namenode发送过来的指令        
          if (!processCommand(resp.getCommands()))
            continue;
          long endProcessCommands = monotonicNow();
          if (endProcessCommands - startProcessCommands > 2000) {
            LOG.info("Took " + (endProcessCommands - startProcessCommands)
                + "ms to process " + resp.getCommands().length
                + " commands from NN");
          }
        }
      }
      if (sendImmediateIBR ||
          (startTime - lastDeletedReport > dnConf.deleteReportInterval)) {
        reportReceivedDeletedBlocks();
        lastDeletedReport = startTime;
      }
      List<DatanodeCommand> cmds = blockReport();
      processCommand(cmds == null ? null : cmds.toArray(new DatanodeCommand[cmds.size()]));
      DatanodeCommand cmd = cacheReport();
      processCommand(new DatanodeCommand[]{ cmd });
      //
      // There is no work to do;  sleep until hearbeat timer elapses, 
      // or work arrives, and then iterate again.
      //
      long waitTime = dnConf.heartBeatInterval - 
      (monotonicNow() - lastHeartbeat);
      synchronized(pendingIncrementalBRperStorage) {
        if (waitTime > 0 && !sendImmediateIBR) {
          try {
            pendingIncrementalBRperStorage.wait(waitTime);
          } catch (InterruptedException ie) {
            LOG.warn("BPOfferService for " + this + " interrupted");
          }
        }
      } // synchronized
    } catch(RemoteException re) {
      String reClass = re.getClassName();
      if (UnregisteredNodeException.class.getName().equals(reClass) ||
          DisallowedDatanodeException.class.getName().equals(reClass) ||
          IncorrectVersionException.class.getName().equals(reClass)) {
        LOG.warn(this + " is shutting down", re);
        shouldServiceRun = false;
        return;
      }
      LOG.warn("RemoteException in offerService", re);
      try {
        long sleepTime = Math.min(1000, dnConf.heartBeatInterval);
        Thread.sleep(sleepTime);
      } catch (InterruptedException ie) {
        Thread.currentThread().interrupt();
      }
    } catch (IOException e) {
      LOG.warn("IOException in offerService", e);
    }
    processQueueMessages();
  } // while (shouldRun())
} // offerService

总结

本文通过对DataNode核心源码的剖析，分享了DataNode的初始化、注册和心跳机制，因为篇幅原因部分代码不做详细的展开，通篇下来可以看到HDFS大量引用了构建者设计模式，涉及了指令模式以及线程阻塞方面的设计，这些是值得我们在实际的开发工作中借鉴的，希望对大家有所启示和帮助。