源码解析Hadoop的DataNode启动时底层都做了哪些事


以hadoop3.x版本为例

DataNode启动时做的大致工作如下:

  • DataNode初始化
    • 1.初始化了一个DataXceiverServer
    • 2.初始化了DataNode的web页面的HTTP服务
    • 3.初始化了DataNode的RPC服务,接收客户端对其访问
    • 4.向NameNode进行注册
    • 5.向NameNode发送相关的心跳信息
      image-20210406234846422
      搜索DataNode,进入DataNode的main方法
public static void main(String args[]) {
    secureMain(args, null);
}

进入secureMain(args, null),创建了DataNode

public static void secureMain(String args[], SecureResources resources) {
      DataNode datanode = createDataNode(args, null, resources);
  }

进入到createDataNode(args, null, resources),里面对DataNode进行了初始化,初始化完成之后就进行了执行dn.runDatanodeDaemon()

public static DataNode createDataNode(String args[], Configuration conf,
      SecureResources resources) throws IOException {
    DataNode dn = instantiateDataNode(args, conf, resources);
    if (dn != null) {
      dn.runDatanodeDaemon();
    }
    return dn;
  }

初始化方法

进入初始化方法instantiateDataNode(args, conf, resources)makeInstance(dataLocations, conf, resources)创建一个实例化对象

public static DataNode instantiateDataNode(String args [], Configuration conf,
      SecureResources resources) throws IOException {
    return makeInstance(dataLocations, conf, resources);
  }

进入makeInstance(dataLocations, conf, resources),创建实例化对象时返回了一个DataNode

static DataNode makeInstance(Collection<StorageLocation> dataDirs,
      Configuration conf, SecureResources resources) throws IOException {
    return new DataNode(conf, locations, storageLocationChecker, resources);
  }

进入DataNode(conf, locations, storageLocationChecker, resources)

DataNode(final Configuration conf,
         final List<StorageLocation> dataDirs,
         final StorageLocationChecker storageLocationChecker,
         final SecureResources resources) throws IOException {
    try {
        startDataNode(dataDirs, resources);
    } 
}

进入startDataNode(dataDirs, resources),此方法真正对DataNode开始进行初始化

void startDataNode(List<StorageLocation> dataDirectories,
                     SecureResources resources
                     ) throws IOException {
    
    // global DN settings
    registerMXBean();
    initDataXceiver();
    startInfoServer();
    initIpcServer();
  }

1.初始化DataXceiverServer

进入到initDataXceiver()方法,会开启DataXceiverServer服务

private void initDataXceiver() throws IOException {
      if (domainPeerServer != null) {
        this.localDataXceiverServer = new Daemon(threadGroup,
            new DataXceiverServer(domainPeerServer, getConf(), this));
      }
    }
  }

2.初始化HTTP服务

回退到startDataNode(dataDirs, resources),进入到startInfoServer()方法

void startDataNode(List<StorageLocation> dataDirectories,
                     SecureResources resources
                     ) throws IOException {
    
    // global DN settings
    registerMXBean();
    initDataXceiver();
    startInfoServer();
    
  }

startInfoServer()里创建了一个DatanodeHttpServer

private void startInfoServer()
    throws IOException {
    httpServer = new DatanodeHttpServer(getConf(), this, httpServerChannel);
    httpServer.start();
  }

进入到DatanodeHttpServer(getConf(), this, httpServerChannel),此方法里面new了一个HttpServer2

public DatanodeHttpServer(final Configuration conf,
      final DataNode datanode,
      final ServerSocketChannel externalHttpChannel)

    HttpServer2.Builder builder = new HttpServer2.Builder()
        .setName("datanode")
        .setConf(confForInfoServer)
        .setACL(new AccessControlList(conf.get(DFS_ADMIN, " ")))
        .hostName(getHostnameForSpnegoPrincipal(confForInfoServer))
        .addEndpoint(URI.create("http://localhost:" + proxyPort))
        .setFindPort(true);
}

3.初始化DataNode的RPC服务端

回退到startDataNode(dataDirs, resources),有一个initIpcServer()方法用来初始化RPC服务

void startDataNode(List<StorageLocation> dataDirectories,
                     SecureResources resources
                     ) throws IOException {
    
    // global DN settings
    registerMXBean();
    initDataXceiver();
    startInfoServer();
    initIpcServer();
  }

initIpcServer()方法里面创建了RPC服务

private void initIpcServer() throws IOException {
    InetSocketAddress ipcAddr = NetUtils.createSocketAddr(
        getConf().getTrimmed(DFS_DATANODE_IPC_ADDRESS_KEY));
   
    ipcServer = new RPC.Builder(getConf())
        .setProtocol(ClientDatanodeProtocolPB.class)
        .setInstance(service)
        .setBindAddress(ipcAddr.getHostName())
        .setPort(ipcAddr.getPort())
        .setNumHandlers(
            getConf().getInt(DFS_DATANODE_HANDLER_COUNT_KEY,
                DFS_DATANODE_HANDLER_COUNT_DEFAULT)).setVerbose(false)
        .setSecretManager(blockPoolTokenSecretManager).build();
  }

4.DataNode向NameNode注册

回退到startDataNode(dataDirs, resources),有一个refreshNamenodes(getConf())方法用来向NameNode注册

void startDataNode(List<StorageLocation> dataDirectories,
                     SecureResources resources
                     ) throws IOException {
    
    // global DN settings
    registerMXBean();
    initDataXceiver();
    startInfoServer();
    initIpcServer();
    blockPoolManager.refreshNamenodes(getConf());
  }

refreshNamenodes(getConf())方法里有一个doRefreshNamenodes(newAddressMap, newLifelineAddressMap)

void refreshNamenodes(Configuration conf)
      throws IOException {
    synchronized (refreshNamenodesLock) {
      doRefreshNamenodes(newAddressMap, newLifelineAddressMap);
    }
  }

继续跟进doRefreshNamenodes(newAddressMap, newLifelineAddressMap)

  • for循环遍历的是所有的ns,相当于跟所有的nameNode进行通信,因为考虑到Hadoop高可用时,会有多个NameNode
  • 所以对于每一个NameNode都要创建一个BPOS服务
  • 最后把所有创建的服务统一开启
private void doRefreshNamenodes(
    Map<String, Map<String, InetSocketAddress>> addrMap,
    Map<String, Map<String, InetSocketAddress>> lifelineAddrMap)
    throws IOException {

    if (!toAdd.isEmpty()) {
        
        for (String nsToAdd : toAdd) {
            BPOfferService bpos = createBPOS(nsToAdd, addrs, lifelineAddrs);
        }
    }
    startAll();    
}

跟进到createBPOS(nsToAdd, addrs, lifelineAddrs)方法

protected BPOfferService createBPOS(
      final String nameserviceId,
      List<InetSocketAddress> nnAddrs,
      List<InetSocketAddress> lifelineNnAddrs) {
    return new BPOfferService(nameserviceId, nnAddrs, lifelineNnAddrs, dn);
  }

继续跟进到BPOfferService(nameserviceId, nnAddrs, lifelineNnAddrs, dn),创建了多个BPOfferService,有多少个NameNode就创建多少个BPOfferService

BPOfferService(
      final String nameserviceId,
      List<InetSocketAddress> nnAddrs,
      List<InetSocketAddress> lifelineNnAddrs,
      DataNode dn) {
 
    for (int i = 0; i < nnAddrs.size(); ++i) {
      this.bpServices.add(new BPServiceActor(nnAddrs.get(i), lifelineNnAddrs.get(i), this));
    }
  }

回退到doRefreshNamenodes(newAddressMap, newLifelineAddressMap)方法,进入startAll()

private void doRefreshNamenodes(
    Map<String, Map<String, InetSocketAddress>> addrMap,
    Map<String, Map<String, InetSocketAddress>> lifelineAddrMap)
    throws IOException {

    if (!toAdd.isEmpty()) {
        
        for (String nsToAdd : toAdd) {
            BPOfferService bpos = createBPOS(nsToAdd, addrs, lifelineAddrs);
        }
    }
    startAll();    
}

startAll()里有一个bpos.start()方法开启bpos

synchronized void startAll() throws IOException {
    try {
      UserGroupInformation.getLoginUser().doAs(
          new PrivilegedExceptionAction<Object>() {
              
            @Override
            public Object run() throws Exception {
              for (BPOfferService bpos : offerServices) {
                bpos.start();
              }
              return null;
            }
          });
    } 
  }

进入bpos.start()

void start() {
    for (BPServiceActor actor : bpServices) {
        actor.start();
    }
}

开启一个线程,用这个线程对每个namenode进行注册

void start() {
    bpThread.start();
}

开启线程之后就对应地去找到当前类的run()方法,run()方法里执行了 connectToNNAndHandshake()方法,连接NameNode

public void run() {
    try {
      while (true) {
        // init stuff
        try {
          // setup storage
          connectToNNAndHandshake();
          break;
        }
  }

跟进 connectToNNAndHandshake()方法,通过dn.connectToNN(nnAddr)方法获取了NameNode的代理,得到NameNode客户端

private void connectToNNAndHandshake() throws IOException {
    // get NN proxy
    bpNamenode = dn.connectToNN(nnAddr);
    register(nsInfo);
}

跟进dn.connectToNN(nnAddr)

DatanodeProtocolClientSideTranslatorPB connectToNN(
      InetSocketAddress nnAddr) throws IOException {
    return new DatanodeProtocolClientSideTranslatorPB(nnAddr, getConf());
  }

跟进new DatanodeProtocolClientSideTranslatorPB(nnAddr, getConf())

public DatanodeProtocolClientSideTranslatorPB(InetSocketAddress nameNodeAddr,
      Configuration conf) throws IOException {
    rpcProxy = createNamenode(nameNodeAddr, conf, ugi);
  }

跟进createNamenode(nameNodeAddr, conf, ugi),获取到了一个namenode的代理对象

private static DatanodeProtocolPB createNamenode(...) throws IOException {
    return RPC.getProxy(DatanodeProtocolPB.class,
        RPC.getProtocolVersion(DatanodeProtocolPB.class), nameNodeAddr, ugi,
        conf, NetUtils.getSocketFactory(conf, DatanodeProtocolPB.class));
  }

回退到connectToNNAndHandshake()方法,获取到namenode代理对象之后就执行了注册方法

private void connectToNNAndHandshake() throws IOException {
    // get NN proxy
    bpNamenode = dn.connectToNN(nnAddr);
   
    register(nsInfo);
}

进入到register(nsInfo)方法,在register(nsInfo)方法里,通过刚才得到的namenode代理对象调用registerDatanode(newBpRegistration)方法来将datanode向namenode注册

void register(NamespaceInfo nsInfo) throws IOException {

    while (shouldRun()) {
      try {
        // Use returned registration from namenode with updated fields
        newBpRegistration = bpNamenode.registerDatanode(newBpRegistration);
      } 
  }

接下来,就要进入到NameNodeRpc类中查看registerDatanode(newBpRegistration)shift + shift搜索NameNodeRpc

查看registerDatanode(newBpRegistration)方法

public DatanodeRegistration registerDatanode(DatanodeRegistration nodeReg)
    throws IOException {
    namesystem.registerDatanode(nodeReg);
}

跟进到registerDatanode(nodeReg)

void registerDatanode(DatanodeRegistration nodeReg) throws IOException {
	blockManager.registerDatanode(nodeReg);
}

继续跟进blockManager.registerDatanode(nodeReg)

public void registerDatanode(DatanodeRegistration nodeReg)
    throws IOException {
    datanodeManager.registerDatanode(nodeReg);
}

datanodeManager.registerDatanode(nodeReg)方法里,将传进来的nodeReg传给了nodeDescr对象,再调用addDatanode(nodeDescr)方法注册一个新的dataNode,至此,新的datanode已经注册成功,同时将此datanode信息添加到了对应的心跳管理者

 public void registerDatanode(DatanodeRegistration nodeReg)
      throws DisallowedDatanodeException, UnresolvedTopologyException {
     	DatanodeDescriptor nodeDescr 
        = new DatanodeDescriptor(nodeReg, NetworkTopology.DEFAULT_RACK);
     	// register new datanode
        addDatanode(nodeDescr);
     	blockManager.getBlockReportLeaseManager().register(nodeDescr);
        // also treat the registration message as a heartbeat
        // no need to update its timestamp
        // because its is done when the descriptor is created
        heartbeatManager.addDatanode(nodeDescr);
        heartbeatManager.updateDnStat(nodeDescr);
 }

5.DataNode向NameNode发送相关的心跳信息

往上回退到datanode的start方法对应的run方法,经过上述的一系列步骤,datanode已经向namenode注册成功,接下来在run方法里调用offerService()方法向NameNode发送心跳

public void run() {
    try {
      while (true) {
        // init stuff
        try {
          // setup storage
          connectToNNAndHandshake();
          break;
        } 
      }

      while (shouldRun()) {
        try {
          offerService();
        } 
      }
  }

跟进到offerService()查看datanode是如何发送心跳

private void offerService() throws Exception {
    resp = sendHeartBeat(requestBlockReportLease);
}

这里的bpNamenode就是刚才获取到的namenode代理对象,然后调用了sendHeartbeat方法

  HeartbeatResponse sendHeartBeat(boolean requestBlockReportLease)
      throws IOException {
      HeartbeatResponse response = bpNamenode.sendHeartbeat(bpRegistration,
        reports,
        dn.getFSDataset().getCacheCapacity(),
        dn.getFSDataset().getCacheUsed(),
        dn.getXmitsInProgress(),
        dn.getXceiverCount(),
        numFailedVolumes,
        volumeFailureSummary,
        requestBlockReportLease,
        slowPeers,
        slowDisks);
  }

通过namenode对象调用sendHeartbeat方法,现在我们查找namenodeRpcServer类中的sendHeartbeat方法,方法里handleHeartbeat方法处理心跳

public HeartbeatResponse sendHeartbeat(...) throws IOException {
    return namesystem.handleHeartbeat(...);
}
HeartbeatResponse handleHeartbeat(...) throws IOException {
    DatanodeCommand[] cmds = blockManager.getDatanodeManager().handleHeartbeat(...);
}

由心跳管理者heartbeatManager更新心跳相关信息updateHeartbeat()

public DatanodeCommand[] handleHeartbeat(...) throws IOException {
	heartbeatManager.updateHeartbeat(nodeinfo, reports, cacheCapacity,
        cacheUsed, xceiverCount, failedVolumes, volumeFailureSummary);
}
synchronized void updateHeartbeat(...) {
    blockManager.updateHeartbeat(node, reports, cacheCapacity, cacheUsed,
        xceiverCount, failedVolumes, volumeFailureSummary);
  }
void updateHeartbeat(...) {
    node.updateHeartbeat(reports, cacheCapacity, cacheUsed, xceiverCount,
                         failedVolumes, volumeFailureSummary);
}
void updateHeartbeat(...) {
    updateHeartbeatState(reports, cacheCapacity, cacheUsed, xceiverCount,
                         volFailures, volumeFailureSummary);
}

到这里会更新上次发送心跳的时间为当前时间setLastUpdate(Time.now()),并且更新了存储状态相关的信息

updateStorageStats()

void updateHeartbeatState(...) {
    updateStorageStats(reports, cacheCapacity, cacheUsed, xceiverCount,
        volFailures, volumeFailureSummary);
    setLastUpdate(Time.now());
    setLastUpdateMonotonic(Time.monotonicNow());
  }

往上回退到namenodeRpcServer类中的sendHeartbeat方法下的handleHeartbeat方法,处理完心跳之后会向DataNode返回响应

HeartbeatResponse handleHeartbeat(...) throws IOException {
    DatanodeCommand[] cmds = blockManager.getDatanodeManager().handleHeartbeat(...);
    return new HeartbeatResponse(cmds, haState, rollingUpgradeInfo,
          blockReportLeaseId);
}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值