3.hadoop源码分析：datanode的注册分析

最新推荐文章于 2024-06-04 11:20:50 发布

colossus_bigdata

最新推荐文章于 2024-06-04 11:20:50 发布

阅读量1k

点赞数

分类专栏： hdfs源码分析

本文链接：https://blog.csdn.net/englishsname/article/details/20288071

版权

hdfs源码分析专栏收录该内容

10 篇文章 1 订阅

订阅专栏

在前面的datanode启动分析中，datanode最后启动了多个bpserviceActor线程，每个线程与对应的一个nn去通信。而datanode的注册就在这个线程中，BPServiceActor线程的run方法中的第一个while循环：

while (true) {
  // init stuff
  try {
    //datanode第一次启动注册的过程
    connectToNNAndHandshake();
    break;
  } catch (IOException ioe) {
    // Initial handshake, storage recovery or registration failed
    runningState = RunningState.INIT_FAILED;
    if (shouldRetryInit()) {
      // Retry until all namenode's of BPOS failed initialization
      LOG.error("Initialization failed for " + this + " "
          + ioe.getLocalizedMessage());
      sleepAndLogInterrupts(5000, "initializing");
    } else {
      runningState = RunningState.FAILED;
      LOG.error("Initialization failed for " + this + ". Exiting. ", ioe);
      return;
    }
  }
}

注册的过程在connectToNNAndHandshake()中：

private void connectToNNAndHandshake() throws IOException {
  // get NN proxy
  //返回DatanodeProtocolClientSideTranslatorPB
  //获取namenode的rpc代理：实例化了DatanodeProtocolClientSideTranslatorPB，datanode通过该代理与namenode通信
  //nnAddr在BPServiceActor初始化的时候传入的
  bpNamenode = dn.connectToNN(nnAddr);

  // First phase of the handshake with NN - get the namespace
  // info.
  //通过namenode代理连接到namenode获取namespaceinfo
  NamespaceInfo nsInfo = retrieveNamespaceInfo();

  // Verify that this matches the other NN in this HA pair.
  // This also initializes our block pool in the DN if we are
  // the first NN connection for this BP.
  //验证获取到的namenode信息是否与HA pair中的另一个namenode匹配。
  //如果这个BP第一次连接namenode，还会初始化datanode对应的block pool
  bpos.verifyAndSetNamespaceInfo(this, nsInfo);

  /* set thread name again to include NamespaceInfo when it's available. */
  this.bpThread.setName(formatThreadName("heartbeating", nnAddr));

  // Second phase of the handshake with the NN.
  register(nsInfo);
}

首先看bpNamenode = dn.connectToNN(nnAddr);代码:

DatanodeProtocolClientSideTranslatorPB connectToNN(
    InetSocketAddress nnAddr) throws IOException {
  return new DatanodeProtocolClientSideTranslatorPB(nnAddr, getConf());
}

DatanodeProtocolClientSideTranslatorPB是相当于与namenode交互的nn代理,datanode的注册,发送心跳,汇报block等都封装在这里面,datanode主要使用该类与nn通信.

再来看NamespaceInfo nsInfo = retrieveNamespaceInfo();改方法获取集群的namenode信息,封装再NamespaceInfo中,NamespaceInfo中包括buildVersion，blockPoolID，softwareVersion等信息:这里可学习一下失败重试循环获取信息的写法:

@VisibleForTesting
NamespaceInfo retrieveNamespaceInfo() throws IOException {
  //NamespaceInfo中包含了集群中namenode的buildVersion，blockPoolID，softwareVersion等信息
  NamespaceInfo nsInfo = null;
  while (shouldRun()) { //重试策略写法
    try {
      //rpc调用namenode rpc server的rpc端口，调用versionRequest方法
      nsInfo = bpNamenode.versionRequest();
      LOG.debug(this + " received versionRequest response: " + nsInfo);
      break;
    } catch(SocketTimeoutException e) {  // namenode is busy
      LOG.warn("Problem connecting to server: " + nnAddr);
    } catch(IOException e ) {  // namenode is not available
      LOG.warn("Problem connecting to server: " + nnAddr);
    }
    
    // try again in a second
    //如果出现异常，休息一段时间重试
    sleepAndLogInterrupts(5000, "requesting version info from NN");
  }
  
  if (nsInfo != null) {
    //讲namenode的version与datanode的version对比
    checkNNVersion(nsInfo);
  } else {
    throw new IOException("DN shut down before block pool connected");
  }
  return nsInfo;
}

在改rpc请求过程中,datanode是客户端,namenode是服务端,下面查看namenode作为rpc server对于datenode的请求是如何相应的:查看namenode的NameNodeRpcServer类的versionRequest()方法:

@Override // DatanodeProtocol, NamenodeProtocol
public NamespaceInfo versionRequest() throws IOException {
  checkNNStartup();
  return namesystem.getNamespaceInfo();
}

调用的是rpcserver所持有的FSNamesystem里的方法,FSNamesystem包含了namenode的文件系统的信息.

NamespaceInfo getNamespaceInfo() {
  readLock();
  try {
    return unprotectedGetNamespaceInfo();
  } finally {
    readUnlock("getNamespaceInfo");
  }
}

最后namenode端封装一个NamespaceInfo返回给客户端

NamespaceInfo unprotectedGetNamespaceInfo() {
  return new NamespaceInfo(getFSImage().getStorage().getNamespaceID(),
      getClusterId(), getBlockPoolId(),
      getFSImage().getStorage().getCTime(), getState());
}

所以上面datanode端通过rpc代理请求namenode的过程是:

1.namenode在启动的时候会启动rpcserver服务,该服务绑定了特定的端口,并且实现了各种服务端方法,不同的方法与不同的请求绑定起来.

2.datanode端启动的时候开启一个namenode的rpc远程代理,调用该代理就像调用本地方法一样,对于datanode是透明的,该代理将通过rpc,通过网络请求及序列化一系列操作去调用远程的namenode上的rpcserver.

3.datanode通过代理调用namenode,namenode调用持有的FSnameSystem中的方法,响应对应的请求,并返回结果给datanode.

datanode从namenode获取NamespaceInfo 后,进入验证环节:bpos.verifyAndSetNamespaceInfo(this, nsInfo);

/**
 * Called by the BPServiceActors when they handshake to a NN.
 * If this is the first NN connection, this sets the namespace info
 * for this BPOfferService. If it's a connection to a new NN, it
 * verifies that this namespace matches (eg to prevent a misconfiguration
 * where a StandbyNode from a different cluster is specified)
 */
void verifyAndSetNamespaceInfo(BPServiceActor actor, NamespaceInfo nsInfo)
  throws IOException {
  writeLock();
  //如果连接的是active的那个namenode，并且没有绑定过actor角色，那么将该actor与active的namenode绑定
  if(nsInfo.getState() == HAServiceState.ACTIVE
      && bpServiceToActive == null) {
    LOG.info("Acknowledging ACTIVE Namenode during handshake" + actor);
    bpServiceToActive = actor;
  }

  try {
    if (setNamespaceInfo(nsInfo) == null) {
      boolean success = false;

      // Now that we know the namespace ID, etc, we can pass this to the DN.
      // The DN can now initialize its local storage if we are the
      // first BP to handshake, etc.
      try {
        dn.initBlockPool(this);
        success = true;
      } finally {
        if (!success) {
          // The datanode failed to initialize the BP. We need to reset
          // the namespace info so that other BPService actors still have
          // a chance to set it, and re-initialize the datanode.
          setNamespaceInfo(null);
        }
      }
    }
  } finally {
    writeUnlock();
  }
}

在这里，分两种情况。dn启动的时候，其上面的两个actor分别与namenode对通信。如果是第一个actor通信完成返回拿到了namenode的元数据信息，也就是进入上面的if分支，那么会进行本地存储的初始化，也就是dn.initBlockPool(this); 如果是第二个actor拿到namenode元数据信息并返回namenode元数据信息，那么它会与第一个actor拿到的做比较，也就是确定active返回的信息与standby返回的信息是否一致。第二种情况比较简单，只是单纯的比较，下面来分析第一种情况：也就是第一个actor访问namenode拉取到的bpNSInfo后：dn.initBlockPool(this)，该方法调用的是Datanode对象，来看初始化blockPoll方法：

/**
 * One of the Block Pools has successfully connected to its NN.
 * This initializes the local storage for that block pool,
 * checks consistency of the NN's cluster ID, etc.
 * 
 * If this is the first block pool to register, this also initializes
 * the datanode-scoped storage.
 * 
 * @param bpos Block pool offer service
 * @throws IOException if the NN is inconsistent with the local storage.
 */
void initBlockPool(BPOfferService bpos) throws IOException {
  //actor第一次拿到的NamespaceInfo
  NamespaceInfo nsInfo = bpos.getNamespaceInfo();
  if (nsInfo == null) {
    throw new IOException("NamespaceInfo not found: Block pool " + bpos
        + " should have retrieved namespace info before initBlockPool.");
  }
  //设置对应的clusterID和blockpoolId
  setClusterId(nsInfo.clusterID, nsInfo.getBlockPoolID());

  // Register the new block pool with the BP manager.
  //将bpofferservice加入到blockPoolManager，一个blockpool对应一个bpofferservice
  blockPoolManager.addBlockPool(bpos);
  
  // In the case that this is the first block pool to connect, initialize
  // the dataset, block scanners, etc.
  //初始化存储，如果这是第一次，还需要初始化dataset, block scanners, etc.
  initStorage(nsInfo);

  // Exclude failed disks before initializing the block pools to avoid startup
  // failures.
  //磁盘检查，排除有问题的磁盘
  checkDiskError();
  //data是个FsDatasetSpi<? extends FsVolumeSpi>，将BlockPoolID传递给FsDataset
  //这里主要是初始化一些元数据信息。放在一些map内
  data.addBlockPool(nsInfo.getBlockPoolID(), conf);
  //初始化定时scanner
  initPeriodicScanners(conf);
}

首先，在datanode的blockmanager里注册一个bpofferservice：blockPoolManager.addBlockPool(bpos)

synchronized void addBlockPool(BPOfferService bpos) {
  Preconditions.checkArgument(offerServices.contains(bpos),
      "Unknown BPOS: %s", bpos);
  if (bpos.getBlockPoolId() == null) {
    throw new IllegalArgumentException("Null blockpool id");
  }
  bpByBlockPoolId.put(bpos.getBlockPoolId(), bpos);
}

其中bpByBlockPoolId是个map，

private final Map<String, BPOfferService> bpByBlockPoolId=Maps.newHashMap();

也就是在blockmanager中记录一下这个BPOfferService。

下面来到initStorage(nsInfo);方法：这里主要是初始化FsDatasetImpl对象，FsDatasetImpl用来管理实际存储在dn上的磁盘文件

/**
 * Initializes the {@link #data}. The initialization is done only once, when
 * handshake with the the first namenode is completed.
 * 该方法只有在datanode第一次连接namenode结束的时候才会被调用
 */
private void initStorage(final NamespaceInfo nsInfo) throws IOException {
  //创建FsDatasetSpi.Factory
  final FsDatasetSpi.Factory<? extends FsDatasetSpi<?>> factory
      = FsDatasetSpi.Factory.getFactory(conf);
  
  if (!factory.isSimulated()) {
    final StartupOption startOpt = getStartupOption(conf);
    if (startOpt == null) {
      throw new IOException("Startup option not set.");
    }
    final String bpid = nsInfo.getBlockPoolID();
    //read storage info, lock data dirs and transition fs state if necessary
    synchronized (this) {
      storage.recoverTransitionRead(this, nsInfo, dataDirs, startOpt);
    }
    final StorageInfo bpStorage = storage.getBPStorage(bpid);
    LOG.info("Setting up storage: nsid=" + bpStorage.getNamespaceID()
        + ";bpid=" + bpid + ";lv=" + storage.getLayoutVersion()
        + ";nsInfo=" + nsInfo + ";dnuuid=" + storage.getDatanodeUuid());
  }

  // If this is a newly formatted DataNode then assign a new DatanodeUuid.
  //分配datanode的 uuid
  checkDatanodeUuid();

  synchronized(this)  {
    //data:FsDatasetSpi<? extends FsVolumeSpi>对象
    if (data == null) {
      //return new FsDatasetImpl(datanode, storage, conf);FsDatasetImpl用来管理实际存储在dn上的磁盘文件
      data = factory.newInstance(this, storage, conf);
    }
  }
}

最后一个addBlockPool主要是添加记录一些元数据信息到内存里面：

public void addBlockPool(String bpid, Configuration conf)
    throws IOException {
  LOG.info("Adding block pool " + bpid);
  synchronized(this) {
    //FsVolumeList中加入bpid
    volumes.addBlockPool(bpid, conf);
    //初始化blockpool
    volumeMap.initBlockPool(bpid);
  }
  //
  volumes.getAllVolumesMap(bpid, volumeMap, ramDiskReplicaTracker);
}

scanner的初始化，主要是初始化两个scanner：

private void initPeriodicScanners(Configuration conf) {
  initDataBlockScanner(conf);
  initDirectoryScanner(conf);
}

回到最初的connectToNNAndHandshake方法，在执行完namenode的信息，并且执行完bpos.verifyAndSetNamespaceInfo(nsInfo)之后，就开始执行注册方法了：

void register() throws IOException {
  // The handshake() phase loaded the block pool storage
  // off disk - so update the bpRegistration object from that info
  //调用BPOfferService创建datanode的注册信息对象，主要是datanode的一些ID，version和namenode的ID等信息
  //第一步：构建注册对象
  bpRegistration = bpos.createRegistration();

  LOG.info(this + " beginning handshake with NN");

  while (shouldRun()) {
    try {
      // Use returned registration from namenode with updated fields
      //第二步：注册
      bpRegistration = bpNamenode.registerDatanode(bpRegistration);
      break;
    } catch(EOFException e) {  // namenode might have just restarted
      LOG.info("Problem connecting to server: " + nnAddr + " :"
          + e.getLocalizedMessage());
      sleepAndLogInterrupts(1000, "connecting to server");
    } catch(SocketTimeoutException e) {  // namenode is busy
      LOG.info("Problem connecting to server: " + nnAddr);
      sleepAndLogInterrupts(1000, "connecting to server");
    }
  }
  
  LOG.info("Block pool " + this + " successfully registered with NN");
  //注册成功后的处理
  bpos.registrationSucceeded(this, bpRegistration);

  // random short delay - helps scatter the BR from all DNs
  //注册成功后，开启blockreport线程
  scheduleBlockReport(dnConf.initialBlockReportDelay);
}

这里分为三步：第一步，注册前，构造一个注册对象。第二步：将注册对象通过namenode代理发送给namenode去注册。第三步：注册成功，后续处理，开启blockReport线程。

先来看第一步，bpRegistration是个DatanodeRegistration对象：

/** 
 * DatanodeRegistration class contains all information the name-node needs
 * to identify and verify a data-node when it contacts the name-node.
 * This information is sent by data-node with each communication request.
 */
@InterfaceAudience.Private
@InterfaceStability.Evolving
public class DatanodeRegistration extends DatanodeID
    implements NodeRegistration {

  private final StorageInfo storageInfo;
  private ExportedBlockKeys exportedKeys;
  private final String softwareVersion;
  ...
  }

这个对象包含了namenode管理datanode时所用需要定位一个datanode的一些信息。包括存储信息storageInfo，exportedKeys，softwareVersion。下面来看这个注册信息的构建过程：bpRegistration = bpos.createRegistration();

DatanodeRegistration createRegistration() {
  writeLock();
  try {
    Preconditions.checkState(bpNSInfo != null,
        "getRegistration() can only be called after initial handshake");
    return dn.createBPRegistration(bpNSInfo);
  } finally {
    writeUnlock();
  }
}

是使用的datanode的createBPRegistration(bpNSInfo);方法，其中参数bpNSInfo正是dn第一次与namenode通信的时候从namenode获取的信息，包括namenode分配给这个datanode的blockpoolId，软件版本号等。来看createBPRegistration(bpNSInfo)方法：

DatanodeRegistration createBPRegistration(NamespaceInfo nsInfo) {
  //首先拿到存储信息，实际上是从内存map中获取，该map以blockpoolid为key，BlockPoolSliceStorage为value的
  // 一个叫做bpStorageMap的map：Map<String, BlockPoolSliceStorage>
  StorageInfo storageInfo = storage.getBPStorage(nsInfo.getBlockPoolID());
  if (storageInfo == null) {
    // it's null in the case of SimulatedDataSet
    storageInfo = new StorageInfo(
        DataNodeLayoutVersion.CURRENT_LAYOUT_VERSION,
        nsInfo.getNamespaceID(), nsInfo.clusterID, nsInfo.getCTime(),
        NodeType.DATA_NODE);
  }
  //构建一个DatanodeID，包含这些元数据信息
  DatanodeID dnId = new DatanodeID(
      streamingAddr.getAddress().getHostAddress(), hostName, 
      storage.getDatanodeUuid(), getXferPort(), getInfoPort(),
          infoSecurePort, getIpcPort());
  //利用这几个信息构建一个DatanodeRegistration
  return new DatanodeRegistration(dnId, storageInfo, 
      new ExportedBlockKeys(), VersionInfo.getVersion());
}

实际上这里拿的都是一些dn的元数据信息，包括存储信息，主机名称与地址，端口号等用来构造一个注册对象。bpRegistration对象构建好了，那么就要开始将这个注册对象发送给namenode去注册了。

bpRegistration = bpNamenode.registerDatanode(bpRegistration);显然bpNamenode是DatanodeProtocolClientSideTranslatorPB的实例，利用RPC去调用远程namenode的Rpcserver的方法：

public DatanodeRegistration registerDatanode(DatanodeRegistration registration
    ) throws IOException {
  RegisterDatanodeRequestProto.Builder builder = RegisterDatanodeRequestProto
      .newBuilder().setRegistration(PBHelper.convert(registration));
  RegisterDatanodeResponseProto resp;
  try {
    //rpcProxy是个DatanodeProtocolPB对象。
    resp = rpcProxy.registerDatanode(NULL_CONTROLLER, builder.build());
  } catch (ServiceException se) {
    throw ProtobufHelper.getRemoteException(se);
  }
  return PBHelper.convert(resp.getRegistration());
}

底层调用的是rpcProxy，这是namenode端RpcServer的代理对象，rpcProxy.registerDatanode(NULL_CONTROLLER, builder.build());被调用后，通过RPC该请求就会被发送到namenode server端了。

视角切换到namenode的rpcserver服务端，RPCServer不断监听端口，当收到datanode这个rpc客户端发送的注册请求，根据上面的rpc代理方法rpcProxy.registerDatanode，显然服务端的类NamenodeRPCServer类的registerDatanode方法就会被调用来响应客户端请求：

@Override // DatanodeProtocol
public DatanodeRegistration registerDatanode(DatanodeRegistration nodeReg)
    throws IOException {
  checkNNStartup();
  verifySoftwareVersion(nodeReg);
  //注册
  namesystem.registerDatanode(nodeReg);
  return nodeReg;
}

RPCServer只负责接收请求，最终实际执行的还是FSNamesystem的registerDatanode方法： namesystem.registerDatanode(nodeReg);

void registerDatanode(DatanodeRegistration nodeReg) throws IOException {
  writeLock();
  try {
    getBlockManager().getDatanodeManager().registerDatanode(nodeReg);
    checkSafeMode();
  } finally {
    writeUnlock();
  }
}

注册的目的是区分这个datanode有一个新的storage或者汇报namenode没有记录的新的block副本

不同的datastorage是通过storageIDs区分的，这里分为三种情况：

1、现有的节点进行新的存储ID注册

2、现有节点的重复注册，由于集群已经保存有此信息，进行网络位置的更新即可

3、从未注册过的节点，直接进行分配新的存储ID进行注册。

一个新的storage向namenode汇报的时候会生成一个新的storageID，注册成功后，namenode会将namespaceID作为注册id：registrationID返回给datanode，如果registrationID对不上，namenode会拒绝datanode的通信。当namenode挂了并重启，这个registrationID会重新加载进来并接收datanode的请求。而不需要重启整个集群。

上面的注册方法最终调用了DatanodeManager()的registerDatanode：getBlockManager().getDatanodeManager().registerDatanode(nodeReg)

public void registerDatanode(DatanodeRegistration nodeReg)
    throws DisallowedDatanodeException, UnresolvedTopologyException {
  //获取远程调用该rpcserver的client端地址
  InetAddress dnAddress = Server.getRemoteIp();
  if (dnAddress != null) {
    // Mostly called inside an RPC, update ip and peer hostname
    //记录client的hostname和id
    String hostname = dnAddress.getHostName();
    String ip = dnAddress.getHostAddress();
    //不能解析的地址会抛出异常
    if (checkIpHostnameInRegistration && !isNameResolved(dnAddress)) {
      // Reject registration of unresolved datanode to prevent performance
      // impact of repetitive DNS lookups later.
      final String message = "hostname cannot be resolved (ip="
          + ip + ", hostname=" + hostname + ")";
      LOG.warn("Unresolved datanode registration: " + message);
      throw new DisallowedDatanodeException(nodeReg, message);
    }
    // update node registration with the ip and hostname from rpc request
    //将传过来的DatanodeRegistration类中的ip和hostname重新set
    //防止客户端随便写？？
    nodeReg.setIpAddr(ip);
    nodeReg.setPeerHostName(hostname);
  }
  
  try {
    nodeReg.setExportedKeys(blockManager.getBlockKeys());

    // Checks if the node is not on the hosts list.  If it is not, then
    // it will be disallowed from registering. 
    if (!hostFileManager.isIncluded(nodeReg)) {
      throw new DisallowedDatanodeException(nodeReg);
    }
      
    NameNode.stateChangeLog.info("BLOCK* registerDatanode: from "
        + nodeReg + " storage " + nodeReg.getDatanodeUuid());
    //从NavigableMap<String(DatanodeUuid), DatanodeDescriptor>，一个namenode端存储datanode的
    // map结构中获取DatanodeDescriptor
    DatanodeDescriptor nodeS = getDatanode(nodeReg.getDatanodeUuid());
    //这个内存map结构是<hostname,DatanodeDescriptor>
    DatanodeDescriptor nodeN = host2DatanodeMap.getDatanodeByXferAddr(
        nodeReg.getIpAddr(), nodeReg.getXferPort());
    //第一次注册，nodeN=null
    if (nodeN != null && nodeN != nodeS) {
      NameNode.LOG.info("BLOCK* registerDatanode: " + nodeN);
      // nodeN previously served a different data storage, 
      // which is not served by anybody anymore.
      removeDatanode(nodeN);
      // physically remove node from datanodeMap
      wipeDatanode(nodeN);
      nodeN = null;
    }

    if (nodeS != null) {
      if (nodeN == nodeS) {
        // The same datanode has been just restarted to serve the same data 
        // storage. We do not need to remove old data blocks, the delta will
        // be calculated on the next block report from the datanode
        if(NameNode.stateChangeLog.isDebugEnabled()) {
          NameNode.stateChangeLog.debug("BLOCK* registerDatanode: "
              + "node restarted.");
        }
      } else {
        // nodeS is found
        /* The registering datanode is a replacement node for the existing 
          data storage, which from now on will be served by a new node.
          If this message repeats, both nodes might have same storageID 
          by (insanely rare) random chance. User needs to restart one of the
          nodes with its data cleared (or user can just remove the StorageID
          value in "VERSION" file under the data directory of the datanode,
          but this is might not work if VERSION file format has changed 
       */        
        NameNode.stateChangeLog.info("BLOCK* registerDatanode: " + nodeS
            + " is replaced by " + nodeReg + " with the same storageID "
            + nodeReg.getDatanodeUuid());
      }
      
      boolean success = false;
      try {
        // update cluster map
        getNetworkTopology().remove(nodeS);
        if(shouldCountVersion(nodeS)) {
          decrementVersionCount(nodeS.getSoftwareVersion());
        }
        nodeS.updateRegInfo(nodeReg);

        nodeS.setSoftwareVersion(nodeReg.getSoftwareVersion());
        nodeS.setDisallowed(false); // Node is in the include list

        // resolve network location
        if(this.rejectUnresolvedTopologyDN) {
          nodeS.setNetworkLocation(resolveNetworkLocation(nodeS));
          nodeS.setDependentHostNames(getNetworkDependencies(nodeS));
        } else {
          nodeS.setNetworkLocation(
              resolveNetworkLocationWithFallBackToDefaultLocation(nodeS));
          nodeS.setDependentHostNames(
              getNetworkDependenciesWithDefault(nodeS));
        }
        getNetworkTopology().add(nodeS);
          
        // also treat the registration message as a heartbeat
        heartbeatManager.register(nodeS);
        incrementVersionCount(nodeS.getSoftwareVersion());
        checkDecommissioning(nodeS);
        success = true;
      } finally {
        if (!success) {
          removeDatanode(nodeS);
          wipeDatanode(nodeS);
          countSoftwareVersions();
        }
      }
      return;
    }
    //第一次注册上面两个if都不执行，执行到这里。构建一个新的DatanodeDescriptor
    DatanodeDescriptor nodeDescr 
      = new DatanodeDescriptor(nodeReg, NetworkTopology.DEFAULT_RACK);
    boolean success = false;
    try {
      // resolve network location
      if(this.rejectUnresolvedTopologyDN) {
        nodeDescr.setNetworkLocation(resolveNetworkLocation(nodeDescr));
        nodeDescr.setDependentHostNames(getNetworkDependencies(nodeDescr));
      } else {
        nodeDescr.setNetworkLocation(
            resolveNetworkLocationWithFallBackToDefaultLocation(nodeDescr));
        nodeDescr.setDependentHostNames(
            getNetworkDependenciesWithDefault(nodeDescr));
      }
      networktopology.add(nodeDescr);
      nodeDescr.setSoftwareVersion(nodeReg.getSoftwareVersion());

      // register new datanode
      //注册，将nodeDescr放到一些内存结构里
      addDatanode(nodeDescr);
      checkDecommissioning(nodeDescr);
      
      // also treat the registration message as a heartbeat
      // no need to update its timestamp
      // because its is done when the descriptor is created
      //将nodeDescr放到heartbeatManager来管理这个namenode后续的心跳
      heartbeatManager.addDatanode(nodeDescr);
      success = true;
      incrementVersionCount(nodeReg.getSoftwareVersion());
    } finally {
      if (!success) {
        removeDatanode(nodeDescr);
        wipeDatanode(nodeDescr);
        countSoftwareVersions();
      }
    }
  } catch (InvalidTopologyException e) {
    // If the network location is invalid, clear the cached mappings
    // so that we have a chance to re-add this DataNode with the
    // correct network location later.
    List<String> invalidNodeNames = new ArrayList<String>(3);
    // clear cache for nodes in IP or Hostname
    invalidNodeNames.add(nodeReg.getIpAddr());
    invalidNodeNames.add(nodeReg.getHostName());
    invalidNodeNames.add(nodeReg.getPeerHostName());
    dnsToSwitchMapping.reloadCachedMappings(invalidNodeNames);
    throw e;
  }
}

这个方法比较长，但是第一次注册来说，很多方法不会执行，这里的注册无非是将datanode的信息放入到namenode的一些内存数据结构中方便后续的管理。重要的方法一个是addDatanode(nodeDescr);一个是heartbeatManager.addDatanode(nodeDescr);后者是将nodeDescr放到heartbeatManager来管理这个namenode后续的心跳，暂时不看。主要看addDatanode(nodeDescr);

void addDatanode(final DatanodeDescriptor node) {
  // To keep host2DatanodeMap consistent with datanodeMap,
  // remove  from host2DatanodeMap the datanodeDescriptor removed
  // from datanodeMap before adding node to host2DatanodeMap.
  synchronized(datanodeMap) {
    host2DatanodeMap.remove(datanodeMap.put(node.getDatanodeUuid(), node));
  }

  networktopology.add(node); // may throw InvalidTopologyException
  host2DatanodeMap.add(node);
  checkIfClusterIsNowMultiRack(node);

  if (LOG.isDebugEnabled()) {
    LOG.debug(getClass().getSimpleName() + ".addDatanode: "
        + "node " + node + " is added to datanodeMap.");
  }
}

实际上也就是加入到一些内存结构中。

回到NamenodeRpcServer的服务方法，该方法调用一系列方法后，最后将注册信息返回给了datanode，再贴一遍此处的源码：

@Override // DatanodeProtocol
public DatanodeRegistration registerDatanode(DatanodeRegistration nodeReg)
    throws IOException {
  checkNNStartup();
  verifySoftwareVersion(nodeReg);
  //FSNamesystem的registerDatanode 
  namesystem.registerDatanode(nodeReg);
  return nodeReg;
}

在注册过程中，该对象被塞进了exportedKeys等信息，对象被返回datanode。我们回到datanode端。

void register() throws IOException {
  // The handshake() phase loaded the block pool storage
  // off disk - so update the bpRegistration object from that info
  //调用BPOfferService创建datanode的注册信息对象，主要是datanode的一些ID，version和namenode的ID等信息
  bpRegistration = bpos.createRegistration();

  LOG.info(this + " beginning handshake with NN");

  while (shouldRun()) {
    try {
      // Use returned registration from namenode with updated fields
      bpRegistration = bpNamenode.registerDatanode(bpRegistration);
      break;
    } catch(EOFException e) {  // namenode might have just restarted
      LOG.info("Problem connecting to server: " + nnAddr + " :"
          + e.getLocalizedMessage());
      sleepAndLogInterrupts(1000, "connecting to server");
    } catch(SocketTimeoutException e) {  // namenode is busy
      LOG.info("Problem connecting to server: " + nnAddr);
      sleepAndLogInterrupts(1000, "connecting to server");
    }
  }
  
  LOG.info("Block pool " + this + " successfully registered with NN");
  bpos.registrationSucceeded(this, bpRegistration);

  // random short delay - helps scatter the BR from all DNs
  //注册成功后，更新最近report的时间
  scheduleBlockReport(dnConf.initialBlockReportDelay);
}

也就是bpRegistration = bpNamenode.registerDatanode(bpRegistration);收到了namenode的注册响应，后续进入bpos.registrationSucceeded(this, bpRegistration);方法：

synchronized void bpRegistrationSucceeded(DatanodeRegistration bpRegistration,
    String blockPoolId) throws IOException {
  // Set the ID if we haven't already
  if (null == id) {
    //第一次注册回来，会设置id
    id = bpRegistration;
  }

  if(!storage.getDatanodeUuid().equals(bpRegistration.getDatanodeUuid())) {
    throw new IOException("Inconsistent Datanode IDs. Name-node returned "
        + bpRegistration.getDatanodeUuid()
        + ". Expecting " + storage.getDatanodeUuid());
  }
  //更新block pool's state
  registerBlockPoolWithSecretManager(bpRegistration, blockPoolId);
}

最后调用了bpRegistrationSucceeded做注册成功的处理，更新一些内存结构。

注册的最后一步：scheduleBlockReport(dnConf.initialBlockReportDelay)更新最近report的时间。到此，register()方法执行完毕返回，那么connectToNNAndHandshake();方法也执行完毕返回，注册就完成了。

总结一下注册过程：

注册过程在bpserviceActor线程的run方法中，也就是actor启动的时候就会去注册。过程是：

1.首先拿到一个namenode的rpc代理去获取namenode元数据信息，这是datanode启动后的首次与namenode握手通信。

2.拿到的元数据信息后，分两种情况，（1）如果这是一对bpserviceActor中的第二个actor返回的信息，只要与第一个actor的返回信息做对比，这种情况比较简单。（2）如果这是一对bpserviceActor中的第一个actor返回的信息，那么首先做datanode端存储的初始化，然后将第一步拿到的nn元数据信息与本身的一些存储信息组合构造一个注册对象，通过rpc代理发送给namenodeRPCServer对应的注册方法。

3.namenode端的rpcserver收到请求后，将请求转交给FsNameSystem来处理。FsNameSystem会更新一些元数据信息，记录或跟新该datanode的信息，加入到一些manager中来管理这个datanode，并将一些信息写入到注册对象中。将注册对象返回给datanode。

4.datanode收到namenode返回的注册对象，做一些成功的处理，更新内存一些数据。

colossus_bigdata

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
3.hadoop源码分析：datanode的注册分析

1.Pig是基于hadoop的一个数据处理的框架。 MapReduce是使用java进行开发的，Pig有一套自己的数据处理语言，Pig的数据处理过程要转化为MR来运行。2.Pig的数据处理语言是数据流方式的，类似于初中做的数学题。3.Pig基本数据类型：int、long、float、double、chararry、bytearray复合数据类型：Map、Tuple、Bag ...
复制链接

扫一扫

专栏目录