HDFS核心源码解析(二)——DataNode的启动及注册流程

申明:本文基于hadoop2.7 进行源码研读

一、DataNode类代码注释

我简单对类注释做了一些翻译:

/**********************************************************
 * DataNode is a class (and program) that stores a set of
 * blocks for a DFS deployment.  A single deployment can
 * have one or many DataNodes.  Each DataNode communicates
 * regularly with a single NameNode.  It also communicates
 * with client code and other DataNodes from time to time.
 *
 * 翻译一:
 *  DataNode是一个类(也是一个程序),DataNode在hdfs上存储了很多block块。一个分布式文件系统中可以有一个或多个DataNode。
 *  每个DataNode通周期性和NameNode通信,客户端也可以跟DataNode进行通信,或者DataNode之间也可以进行通信
 *
 * DataNodes store a series of named blocks.  The DataNode
 * allows client code to read these blocks, or to write new
 * block data.  The DataNode may also, in response to instructions
 * from its NameNode, delete blocks or copy blocks to/from other
 * DataNodes.
 *
 * 翻译二:
 *  DataNodes存储了一系列的blocks, DataNode允许客户端读写block。
 *  DataNode也会响应NameNode的指令,比如delete blocks or copy blocks到别的DataNode
 *
 * The DataNode maintains just one critical table:
 *   block-> stream of bytes (of BLOCK_SIZE or less)
 *
 * 翻译三:
 *    DataNode管理了一些重要的表
 *      block-> stream of bytes 也包括一些元数据信息
 *
 * This info is stored on a local disk.  The DataNode
 * reports the table's contents to the NameNode upon startup
 * and every so often afterwards.
 *
 * 翻译四:
 *  这些信息存在于本地磁盘,DataNode启动时候会把这些信息上报给NameNode,启动之后也会不断上报
 *
 * DataNodes spend their lives in an endless loop of asking
 * the NameNode for something to do.  A NameNode cannot connect
 * to a DataNode directly; a NameNode simply returns values from
 * functions invoked by a DataNode.
 *
 * 翻译五:
 *  DataNodes 会不断发送消息给NameNode(心跳),请求NameNode自己需要做什么事情
 *  NameNode收到心跳后,会返回指令给DataNode,DataNode收到指令并在本地执行
 *  所以NameNode并不能直接调用DataNode,而是通过心跳返回值中发送指定给DataNode,让DataNode自己执行指令
 *
 * DataNodes maintain an open server socket so that client code 
 * or other DataNodes can read/write data.  The host/port for
 * this server is reported to the NameNode, which then sends that
 * information to clients or other DataNodes that might be interested.
 *
 * 翻译六:
 *  DataNode开放socket服务,让客户端或其他DataNode可以读写数据
 *  DataNode启动的时候会把host/port上报给NameNode
 *  Client或者其他DataNode想要访问某个DataNode,先要跟NameNode通信,获取目标DataNode的host/port
 *
 **********************************************************/

二、从DataNode的main方法开始

2.1 创建一个DataNode

public static void main(String args[]) {
    if (DFSUtil.parseHelpArgument(args, DataNode.USAGE, System.out, true)) {
      System.exit(0);
    }
	// TODO 2.1 核心启动函数
    secureMain(args, null);
}

//2.1.1 核心启动函数
  public static void secureMain(String args[], SecureResources resources) {
    int errorCode = 0;
    try {
      StringUtils.startupShutdownMessage(DataNode.class, args, LOG);
      //TODO 2.1.2 创建一个DataNode实例
      DataNode datanode = createDataNode(args, null, resources);
      if (datanode != null) {
        datanode.join();
      } else {
        errorCode = 1;
      }
    } catch (Throwable e) {
      LOG.fatal("Exception in secureMain", e);
      terminate(1, e);
    } finally {
      // We need to terminate the process here because either shutdown was called
      // or some disk related conditions like volumes tolerated or volumes required
      // condition was not met. Also, In secure mode, control will go to Jsvc
      // and Datanode process hangs if it does not exit.
      LOG.warn("Exiting Datanode");
      terminate(errorCode);
    }
  }

// 2.1.2 创建一个DataNode实例
public static DataNode instantiateDataNode(String args [], Configuration conf,
      SecureResources resources) throws IOException {
    if (conf == null)
      conf = new HdfsConfiguration();
    
    if (args != null) {
      // parse generic hadoop options
      GenericOptionsParser hParser = new GenericOptionsParser(conf, args);
      args = hParser.getRemainingArgs();
    }
    
    if (!parseArguments(args, conf)) {
      printUsage(System.err);
      return null;
    }
    Collection<StorageLocation> dataLocations = getStorageLocations(conf);
    UserGroupInformation.setConfiguration(conf);
    SecurityUtil.login(conf, DFS_DATANODE_KEYTAB_FILE_KEY,
        DFS_DATANODE_KERBEROS_PRINCIPAL_KEY);
    //TODO  2.1.3 创建一个DataNode实例
    return makeInstance(dataLocations, conf, resources);
  }

// 2.1.3 创建一个DataNode实例
static DataNode makeInstance(Collection<StorageLocation> dataDirs,
      Configuration conf, SecureResources resources) throws IOException {
    LocalFileSystem localFS = FileSystem.getLocal(conf);
    FsPermission permission = new FsPermission(
        conf.get(DFS_DATANODE_DATA_DIR_PERMISSION_KEY,
                 DFS_DATANODE_DATA_DIR_PERMISSION_DEFAULT));
    DataNodeDiskChecker dataNodeDiskChecker =
        new DataNodeDiskChecker(permission);
    List<StorageLocation> locations =
        checkStorageLocations(dataDirs, localFS, dataNodeDiskChecker);
    DefaultMetricsSystem.initialize("DataNode");

    assert locations.size() > 0 : "number of data directories should be > 0";
     //TODO 2.2 查看DataNode的构造方法
    return new DataNode(conf, locations, resources);
  }

2.2 启动DataNode

2.2.1 DataNode构造方法

DataNode(final Configuration conf,
           final List<StorageLocation> dataDirs,
           final SecureResources resources) throws IOException {
    // ... 
    try {
      hostName = getHostName(conf);
      LOG.info("Configured hostname is " + hostName);
      //2.2.1 启动DataNode
      startDataNode(conf, dataDirs, resources);
    } catch (IOException ie) {
      shutdown();
      throw ie;
    }
    final int dncCacheMaxSize =
        conf.getInt(DFS_DATANODE_NETWORK_COUNTS_CACHE_MAX_SIZE_KEY,
            DFS_DATANODE_NETWORK_COUNTS_CACHE_MAX_SIZE_DEFAULT) ;
    datanodeNetworkCounts =
        CacheBuilder.newBuilder()
            .maximumSize(dncCacheMaxSize)
            .build(new CacheLoader<String, Map<String, Long>>() {
              @Override
              public Map<String, Long> load(String key) throws Exception {
                final Map<String, Long> ret = new HashMap<String, Long>();
                ret.put("networkErrors", 0L);
                return ret;
              }
            });
  }


 //2.2.1 启动DataNode
 void startDataNode(Configuration conf, 
                     List<StorageLocation> dataDirs,
                     SecureResources resources
                     ) throws IOException {
    // ...
	// TODO 创建数据存储对象 DataStorage
    storage = new DataStorage();
    
    // global DN settings
    registerMXBean();
    // TODO 2.2.2 初始化DataXceiverServer
    initDataXceiver(conf);
    // TODO 2.2.3 启动HttpServer服务
    startInfoServer(conf);
    pauseMonitor = new JvmPauseMonitor(conf);
    pauseMonitor.start();
  
    // BlockPoolTokenSecretManager is required to create ipc server.
    this.blockPoolTokenSecretManager = new BlockPoolTokenSecretManager();

    // Login is done by now. Set the DN user name.
    dnUserName = UserGroupInformation.getCurrentUser().getShortUserName();
    LOG.info("dnUserName = " + dnUserName);
    LOG.info("supergroup = " + supergroup);
    //TODO 2.2.4 初始化RPC服务
    initIpcServer(conf);

    metrics = DataNodeMetrics.create(conf, getDisplayName());
    metrics.getJvmMetrics().setPauseMonitor(pauseMonitor);
   /**
     *  TODO 创建一个BlockPoolManager
     *  BlockPool是块池,正常情况下一个集群就有一个BlockPool
     *  如果使用联邦机制,就会存在多个NameNode,一个联邦对应一个BlockPool
     *  假设一个集群中存在4个NameNode,2个联邦
     *  联邦一:namenode1(Active) namenode1(Stand by) --> namenode1和namenode2是同一个blockpool
     *  联邦二:namenode3(Active) namenode4(Stand by) --> namenode3和namenode4是同一个blockpool
     */
    blockPoolManager = new BlockPoolManager(this);

	//TODO  2.2.5 初始化BlockPoolManager
	// (1) 向NameNode进行注册 (2) 向NameNode发起心跳
    blockPoolManager.refreshNamenodes(conf);

    // Create the ReadaheadPool from the DataNode context so we can
    // exit without having to explicitly shutdown its thread pool.
    readaheadPool = ReadaheadPool.getInstance();
    saslClient = new SaslDataTransferClient(dnConf.conf, 
        dnConf.saslPropsResolver, dnConf.trustedChannelResolver);
    saslServer = new SaslDataTransferServer(dnConf, blockPoolTokenSecretManager);
  }

2.2.2 初始化DataXceiverServer

// 2.2.2 初始化DataXceiverServer
  private void initDataXceiver(Configuration conf) throws IOException {
    // find free port or use privileged port provided
    TcpPeerServer tcpPeerServer;
    if (secureResources != null) {
      tcpPeerServer = new TcpPeerServer(secureResources);
    } else {
      tcpPeerServer = new TcpPeerServer(dnConf.socketWriteTimeout,
          DataNode.getStreamingAddr(conf));
    }
    tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE);
    streamingAddr = tcpPeerServer.getStreamingAddr();
    LOG.info("Opened streaming server at " + streamingAddr);
    this.threadGroup = new ThreadGroup("dataXceiverServer");
    //TODO 实例化了一个DataXceiverServer, 负责接收客户端或者其他DataNode传输过来的数据服务
    xserver = new DataXceiverServer(tcpPeerServer, conf, this);
    //设置为后台线程 
    this.dataXceiverServer = new Daemon(threadGroup, xserver);
    this.threadGroup.setDaemon(true); // auto destroy when empty

    if (conf.getBoolean(DFSConfigKeys.DFS_CLIENT_READ_SHORTCIRCUIT_KEY,
              DFSConfigKeys.DFS_CLIENT_READ_SHORTCIRCUIT_DEFAULT) ||
        conf.getBoolean(DFSConfigKeys.DFS_CLIENT_DOMAIN_SOCKET_DATA_TRAFFIC,
              DFSConfigKeys.DFS_CLIENT_DOMAIN_SOCKET_DATA_TRAFFIC_DEFAULT)) {
      DomainPeerServer domainPeerServer =
                getDomainPeerServer(conf, streamingAddr.getPort());
      if (domainPeerServer != null) {
        this.localDataXceiverServer = new Daemon(threadGroup,
            new DataXceiverServer(domainPeerServer, conf, this));
        LOG.info("Listening on UNIX domain socket: " +
            domainPeerServer.getBindPath());
      }
    }
    this.shortCircuitRegistry = new ShortCircuitRegistry(conf);
  }

2.2.3 启动HttpServer服务

// 2.2.3 启动HttpServer服务
  private void startInfoServer(Configuration conf)
    throws IOException {
    Configuration confForInfoServer = new Configuration(conf);
    confForInfoServer.setInt(HttpServer2.HTTP_MAX_THREADS, 10);
    HttpServer2.Builder builder = new HttpServer2.Builder()
      .setName("datanode")
      .setConf(conf).setACL(new AccessControlList(conf.get(DFS_ADMIN, " ")))
      .addEndpoint(URI.create("http://localhost:0"))
      .setFindPort(true);
    // 设计模式之: 建造模式 构建出了一个HttpServer对象
    this.infoServer = builder.build();

    // httpServer上绑定多个servlet
    this.infoServer.addInternalServlet(null, "/streamFile/*", StreamFile.class);
    this.infoServer.addInternalServlet(null, "/getFileChecksum/*",
        FileChecksumServlets.GetServlet.class);
    
    this.infoServer.setAttribute("datanode", this);
    this.infoServer.setAttribute(JspHelper.CURRENT_CONF, conf);
    this.infoServer.addServlet(null, "/blockScannerReport",
                               BlockScanner.Servlet.class);
    // http服务启动
    this.infoServer.start();
    InetSocketAddress jettyAddr = infoServer.getConnectorAddress(0);

    // SecureDataNodeStarter will bind the privileged port to the channel if
    // the DN is started by JSVC, pass it along.
    ServerSocketChannel httpServerChannel = secureResources != null ?
      secureResources.getHttpServerChannel() : null;
    this.httpServer = new DatanodeHttpServer(conf, jettyAddr, httpServerChannel);
    httpServer.start();
    if (httpServer.getHttpAddress() != null) {
      infoPort = httpServer.getHttpAddress().getPort();
    }
    if (httpServer.getHttpsAddress() != null) {
      infoSecurePort = httpServer.getHttpsAddress().getPort();
    }
  }

2.2.4 初始化RPC服务

//2.2.4 初始化RPC服务
  private void initIpcServer(Configuration conf) throws IOException {
    InetSocketAddress ipcAddr = NetUtils.createSocketAddr(
        conf.getTrimmed(DFS_DATANODE_IPC_ADDRESS_KEY));
    
    // Add all the RPC protocols that the Datanode implements    
    RPC.setProtocolEngine(conf, ClientDatanodeProtocolPB.class,
        ProtobufRpcEngine.class);
    ClientDatanodeProtocolServerSideTranslatorPB clientDatanodeProtocolXlator = 
          new ClientDatanodeProtocolServerSideTranslatorPB(this);
    //处理客户端和其他DataNode的请求
    BlockingService service = ClientDatanodeProtocolService
        .newReflectiveBlockingService(clientDatanodeProtocolXlator);
    //设计模式之: 建造模式 构建出了一个RPCServer对象
    ipcServer = new RPC.Builder(conf)
        .setProtocol(ClientDatanodeProtocolPB.class)
        .setInstance(service)
        .setBindAddress(ipcAddr.getHostName())
        .setPort(ipcAddr.getPort())
        .setNumHandlers(
            conf.getInt(DFS_DATANODE_HANDLER_COUNT_KEY,
                DFS_DATANODE_HANDLER_COUNT_DEFAULT)).setVerbose(false)
        .setSecretManager(blockPoolTokenSecretManager).build();
    
    InterDatanodeProtocolServerSideTranslatorPB interDatanodeProtocolXlator = 
        new InterDatanodeProtocolServerSideTranslatorPB(this);
    service = InterDatanodeProtocolService
        .newReflectiveBlockingService(interDatanodeProtocolXlator);
    DFSUtil.addPBProtocol(conf, InterDatanodeProtocolPB.class, service,
        ipcServer);

    TraceAdminProtocolServerSideTranslatorPB traceAdminXlator =
        new TraceAdminProtocolServerSideTranslatorPB(this);
    BlockingService traceAdminService = TraceAdminService
        .newReflectiveBlockingService(traceAdminXlator);
    DFSUtil.addPBProtocol(conf, TraceAdminProtocolPB.class, traceAdminService,
        ipcServer);

    LOG.info("Opened IPC server at " + ipcServer.getListenerAddress());

    // set service-level authorization security policy
    if (conf.getBoolean(
        CommonConfigurationKeys.HADOOP_SECURITY_AUTHORIZATION, false)) {
      ipcServer.refreshServiceAcl(conf, new HDFSPolicyProvider());
    }
  }

RPCServer由DataNode类直接管理,DataNode实现了多个接口

public class DataNode extends ReconfigurableBase implements InterDatanodeProtocol, ClientDatanodeProtocol,
TraceAdminProtocol, DataNodeMXBean

2.2.5 初始化BlockPoolManager

BlockPoolManager最重要的功能就是维护Datanode上所有BPOfferService对象的引用,同时对外提供多种检索BPOfferService的方式----通过命名空间id(nameserviceId) 检索、通过块池id(blockPoolId) 检索等。

BlockPoolManager类的成员变量

/**
 * Manages the BPOfferService objects for the data node.
 * Creation, removal, starting, stopping, shutdown on BPOfferService
 * objects must be done via APIs in this class.
 */
@InterfaceAudience.Private
class BlockPoolManager {
  private static final Log LOG = DataNode.LOG;

  //维护NameserviceId -> BPOfferService的映射
  private final Map<String, BPOfferService> bpByNameserviceId =
    Maps.newHashMap();
  //维护BlockPoolId -> BPOfferService的映射
  private final Map<String, BPOfferService> bpByBlockPoolId =
    Maps.newHashMap();
  private final List<BPOfferService> offerServices =
    Lists.newArrayList();

  private final DataNode dn;
}

BlockPoolManager中的核心方法refreshNamenodes()

  void refreshNamenodes(Configuration conf)
      throws IOException {
    LOG.info("Refresh request received for nameservices: " + conf.get
            (DFSConfigKeys.DFS_NAMESERVICES));

    Map<String, Map<String, InetSocketAddress>> newAddressMap = DFSUtil
            .getNNServiceRpcAddressesForCluster(conf);

    synchronized (refreshNamenodesLock) {
      //2.2.5.1 TODO (1) 向NameNode进行注册 (2) 向NameNode发起心跳
      doRefreshNamenodes(newAddressMap);
    }
  }


// 2.2.5.1 (1) 向NameNode进行注册 (2) 向NameNode发起心跳
  private void doRefreshNamenodes(
      Map<String, Map<String, InetSocketAddress>> addrMap) throws IOException {
    assert Thread.holdsLock(refreshNamenodesLock);

    Set<String> toRefresh = Sets.newLinkedHashSet();
    Set<String> toAdd = Sets.newLinkedHashSet();
    Set<String> toRemove;
    
    synchronized (this) {
      // Step 1. For each of the new nameservices, figure out whether
      // it's an update of the set of NNs for an existing NS,
      // or an entirely new nameservice.
      // nameservices HA架构
      /**
       *  如果是联邦架构, 就会存在多个nameservices
       *  namenode1, namenode2 -> 联邦1 -> nameservices1
       *  namenode3, namenode4 -> 联邦2 -> nameservices2
       */

      for (String nameserviceId : addrMap.keySet()) {
        if (bpByNameserviceId.containsKey(nameserviceId)) {
          toRefresh.add(nameserviceId);
        } else {
          // toAdd 是一个 Set<String>, 存储着nameservices
          toAdd.add(nameserviceId);
        }
      }
      
      // Step 2. Any nameservices we currently have but are no longer present
      // need to be removed.
      toRemove = Sets.newHashSet(Sets.difference(
          bpByNameserviceId.keySet(), addrMap.keySet()));
      
      assert toRefresh.size() + toAdd.size() ==
        addrMap.size() :
          "toAdd: " + Joiner.on(",").useForNull("<default>").join(toAdd) +
          "  toRemove: " + Joiner.on(",").useForNull("<default>").join(toRemove) +
          "  toRefresh: " + Joiner.on(",").useForNull("<default>").join(toRefresh);

      
      // Step 3. Start new nameservices
      if (!toAdd.isEmpty()) {
        LOG.info("Starting BPOfferServices for nameservices: " +
            Joiner.on(",").useForNull("<default>").join(toAdd));

        // TODO 遍历所有的联邦, 假设一个联邦有两个NameNode(HA)
        // toAdd是nameservices集合
        for (String nsToAdd : toAdd) {
          ArrayList<InetSocketAddress> addrs =
                  //取出nameservices的多个namenode --> namenode1 namenode2
            Lists.newArrayList(addrMap.get(nsToAdd).values());
          /**
           *  一个联邦对应一个BPOfferService
           *  一个联邦里面的NameNode就是一个BPOfferActor
           *  比如:
           *    namenode1, namenode2 -> 联邦1 -> nameservices1
           *    namenode3, namenode4 -> 联邦2 -> nameservices2
           *  nameservices1为一个联邦,对应一个BPOfferService,nameservices1中的两个nameNode对应两个BPOfferActor
           */
          BPOfferService bpos = createBPOS(addrs);
          bpByNameserviceId.put(nsToAdd, bpos);
          // List<BPOfferService> offerServices,将BPOfferService存在集合中
          offerServices.add(bpos);
        }
      }
      // 2.2.5.2 DataNode向NameNode进行注册和心跳
      startAll();
    }

    // Step 4. Shut down old nameservices. This happens outside
    // of the synchronized(this) lock since they need to call
    // back to .remove() from another thread
    if (!toRemove.isEmpty()) {
      LOG.info("Stopping BPOfferServices for nameservices: " +
          Joiner.on(",").useForNull("<default>").join(toRemove));

      for (String nsToRemove : toRemove) {
        BPOfferService bpos = bpByNameserviceId.get(nsToRemove);
        bpos.stop();
        bpos.join();
        // they will call remove on their own
      }
    }
    
    // Step 5. Update nameservices whose NN list has changed
    if (!toRefresh.isEmpty()) {
      LOG.info("Refreshing list of NNs for nameservices: " +
          Joiner.on(",").useForNull("<default>").join(toRefresh));
      
      for (String nsToRefresh : toRefresh) {
        BPOfferService bpos = bpByNameserviceId.get(nsToRefresh);
        ArrayList<InetSocketAddress> addrs =
          Lists.newArrayList(addrMap.get(nsToRefresh).values());
        bpos.refreshNNList(addrs);
      }
    }
  }

2.2.6 DataNode向NameNode进行注册和心跳

// 2.2.6 DataNode向NameNode进行注册和心跳
  synchronized void startAll() throws IOException {
    try {
      UserGroupInformation.getLoginUser().doAs(
          new PrivilegedExceptionAction<Object>() {
            @Override
            public Object run() throws Exception {
              //TODO 遍历所有的BPOfferService 遍历所有的联邦
              for (BPOfferService bpos : offerServices) {
                // TODO 2.2.6.1 每个联邦启动
                bpos.start();
              }
              return null;
            }
          });
    } catch (InterruptedException ex) {
      IOException ioe = new IOException();
      ioe.initCause(ex.getCause());
      throw ioe;
    }
  }


//2.2.6.1 每个联邦启动 (BPOfferService类)
  //This must be called only by blockPoolManager
  //通过BPOfferService的start方法循环启动BPServiceActor线程,以便BPServiceActor向其对应的namenode发送注册和心跳消息。
  void start() {
    //TODO 一个BPOfferService(联邦)存在多个BPServiceActor(NameNode)
    for (BPServiceActor actor : bpServices) {
      //TODO 2.2.6.2 DataNode进行注册和心跳
      actor.start();
    }
  }

//2.2.6.2 DataNode进行注册和心跳(BPServiceActor类)
//This must be called only by BPOfferService
  void start() {
    if ((bpThread != null) && (bpThread.isAlive())) {
      //Thread is started already
      return;
    }
    //2.2.6.3 传入的Runable target是this,说明BPServiceActor实现了Runnable接口
    bpThread = new Thread(this, formatThreadName());
    bpThread.setDaemon(true); // needed for JUnit testing
    // 2.2.6.4 调用了Thread类的start方法, 关注类的run方法
    bpThread.start();
  }

// 2.2.6.3 果然BPServiceActor实现了Runnable接口
class BPServiceActor implements Runnable {}

// 2.2.6.4 BPServiceActor类的run方法
public void run() {
    LOG.info(this + " starting to offer service");

    //TODO 注册+心跳
    try {
      while (true) {
        // init stuff
        try {
          // setup storage
          //2.2.6.5 TODO 连接 NameNode并且实现握手
          connectToNNAndHandshake();
          break;
        } catch (IOException ioe) {
          // Initial handshake, storage recovery or registration failed
          runningState = RunningState.INIT_FAILED;
          if (shouldRetryInit()) {
            // Retry until all namenode's of BPOS failed initialization
            LOG.error("Initialization failed for " + this + " "
                + ioe.getLocalizedMessage());
            // TODO 如果有问题睡眠5秒
            sleepAndLogInterrupts(5000, "initializing");
          } else {
            runningState = RunningState.FAILED;
            LOG.fatal("Initialization failed for " + this + ". Exiting. ", ioe);
            return;
          }
        }
      }

      runningState = RunningState.RUNNING;

      while (shouldRun()) {
        try {
          //周期性地发送心跳,默认是3秒一次
          offerService();
        } catch (Exception ex) {
          LOG.error("Exception in BPOfferService for " + this, ex);
          sleepAndLogInterrupts(5000, "offering service");
        }
      }
      runningState = RunningState.EXITED;
    } catch (Throwable ex) {
      LOG.warn("Unexpected exception in block pool " + this, ex);
      runningState = RunningState.FAILED;
    } finally {
      LOG.warn("Ending block pool service for: " + this);
      cleanUp();
    }
  }

// 2.2.6.5 获取NameNode代理对象并注册(涉及两阶段握手)
// BPServiceActor类的connectToNNAndHandshake方法,注册
  private void connectToNNAndHandshake() throws IOException {
    //TODO  get NN proxy 获取到NameNode的代理对象
    bpNamenode = dn.connectToNN(nnAddr);

    //与namenode进行第一阶段握手,获取命名空间信息
    NamespaceInfo nsInfo = retrieveNamespaceInfo();
    
    //验证命名空间信息
    bpos.verifyAndSetNamespaceInfo(nsInfo);
    
    // Second phase of the handshake with the NN.
    // 发送第二阶段握手信息给NameNode,进行注册
    // TODO 2.2.6.6 DataNode向NameNode发起注册
    register(nsInfo);
  }

 // TODO 2.2.6.6 DataNode向NameNode发起注册
 void register(NamespaceInfo nsInfo) throws IOException {
    // The handshake() phase loaded the block pool storage
    // off disk - so update the bpRegistration object from that info
    // TODO 2.2.6.7 创建注册信息 内部封装了主机名/StorageInfo/DataNodeID等信息
    bpRegistration = bpos.createRegistration();

    LOG.info(this + " beginning handshake with NN");

    while (shouldRun()) {
      try {
        //TODO bpNamenode是namenode的代理对象
        //TODO 2.2.6.7 该方法由NameNodeRPCServer调用
        bpRegistration = bpNamenode.registerDatanode(bpRegistration);
        bpRegistration.setNamespaceInfo(nsInfo);
        break;
      } catch(EOFException e) {  // namenode might have just restarted
        LOG.info("Problem connecting to server: " + nnAddr + " :"
            + e.getLocalizedMessage());
        sleepAndLogInterrupts(1000, "connecting to server");
      } catch(SocketTimeoutException e) {  // namenode is busy
        LOG.info("Problem connecting to server: " + nnAddr);
        sleepAndLogInterrupts(1000, "connecting to server");
      }
    }
    
    LOG.info("Block pool " + this + " successfully registered with NN");
    bpos.registrationSucceeded(this, bpRegistration);

    // random short delay - helps scatter the BR from all DNs
    scheduleBlockReport(dnConf.initialBlockReportDelay);
  }

 // 2.2.6.7 该方法由NameNodeRPCServer调用(NameNodeRPCServer.registerDatanode)
  public DatanodeRegistration registerDatanode(DatanodeRegistration nodeReg)
      throws IOException {
      //检查NameNode是否已经启动
    checkNNStartup();
    verifySoftwareVersion(nodeReg); //软件版本校验
    //TODO 2.2.6.8 注册DataNode
    namesystem.registerDatanode(nodeReg);
    return nodeReg;
  }
 
//TODO 2.2.6.8 注册DataNode
void registerDatanode(DatanodeRegistration nodeReg) throws IOException {
    writeLock();
    try {
      //从BlockManager中获取DatanodeManager,调用registerDatanode方法
      //nodeReg 为datanode 传递过来的注册信息
      // 2.2.6.9 registerDatanode
      getBlockManager().getDatanodeManager().registerDatanode(nodeReg);
      checkSafeMode();
    } finally {
      writeUnlock();
    }
  }

// 2.2.6.9 registerDatanode
public void registerDatanode(DatanodeRegistration nodeReg)
      throws DisallowedDatanodeException, UnresolvedTopologyException {
    InetAddress dnAddress = Server.getRemoteIp(); //获取远程注册的DataNode的ip和端口号
    if (dnAddress != null) {
      // Mostly called inside an RPC, update ip and peer hostname
      String hostname = dnAddress.getHostName();
      String ip = dnAddress.getHostAddress();
     //...
      nodeReg.setIpAddr(ip);
      nodeReg.setPeerHostName(hostname);
    }
     
	//...
	 //1. 检查datanode节点运行的hdfs系统是否与namenode的是同一个版本号
	 //2. 检查dfs.hosts[运行被连接到namenode] 和 dfs.hosts.exclude[不允许被连接到namenode]
	 

      //3. 从datanodeMap[StorageID -> DatanodeDescriptor], 通过DataNode的uuid获取对应的DatanodeDescriptor,记为nodeS
      DatanodeDescriptor nodeS = getDatanode(nodeReg.getDatanodeUuid());
      
      //4. 从host2DatanodeMap /** Host names to datanode descriptors mapping. */  ,通过ip和端口得到DatanodeDescriptor,记为nodeN
      DatanodeDescriptor nodeN = host2DatanodeMap.getDatanodeByXferAddr(nodeReg.getIpAddr(), nodeReg.getXferPort());  
       
      //5. 如果nodeN != null 并且 从datanodeMap与从host2DatanodeMap不同,但是元数据信息发生过改变,直接移除该datanode信息,并且把nodeN设置为null
	  if (nodeN != null && nodeN != nodeS) {
        NameNode.LOG.info("BLOCK* registerDatanode: " + nodeN);
        // nodeN previously served a different data storage, 
        // which is not served by anybody anymore.
        removeDatanode(nodeN);
        // physically remove node from datanodeMap
        wipeDatanode(nodeN);
        nodeN = null;
      }
		
	  //6. 如果nodeS存在,则Datanode注册过,更新NetworkTopology,在NetworkTopology删除nodeS,更新nodeS,调用resolveNetworkLocation, 获得nodeS位置并更新

	 //7. 如果nodeS不存在[storageID]为空,说明这是一个新的datanode,分配一个全局的storageID, 创建DatanodeDescriptor
    
     //8. FSNameSystem会根据DataNode的ip地址把它映射到合适的rack,构建StorageID到DataNodeDescriptor的映射得到新的nodeS

	
      // 注册一个DataNode
      addDatanode(nodeDescr);
     // 进行心跳管理,后续HeartbeatMonitor后台线程监控DataNode节点是否存活
      heartbeatManager.addDatanode(nodeDescr);
  }

三、总结

3.1 DataNode在启动过程中涉及的类与对应关系

在这里插入图片描述

3.2 注册流程总结

在这里插入图片描述

  1. 创建一个DataNode对象
  2. 启动DataNode执行DataNode的初始化
  3. 初始化DataStorage
  4. 初始化DataXceiverServer
  5. 启动HttpServer服务并注册sevlet
  6. 初始化RPC服务,用于处理客户端和其他DataNode的请求
  7. 创建一个BlockPoolManager块池,遍历BPOfferService中的BPServiceActor对象的start方法,实际向集群里面的每个NameNode进行注册和心跳
  8. 获取NameNode代理NameNodeRpcServer,调用registerDatanode,在NameNodeRpcServer中获取BlockManager对象,BlockManager获取DatanodeManager调用registerDatanode,在NameNode内存中添加DataNode节点,并纳入心跳管理
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值