2.hadoop源码分析:datanode的启动

datanode的代码位于hadoop-hdfs-project下的org.apache.hadoop.hdfs.server.datanode包中:首先找到该类,先看其注释,大概意思是:

datanode是block存储组件,与namenode通信,也与client端和其他datanode通信datanode管理一系列的block块,允许客户端去读写这些数据块。也会响应namenode对 block的删除,datanode间复制数据的请求。datanode维护着block的元数据信息,该信息存储于本地磁盘上,datanode在启动的时候会像namenode汇报本地block信息,并且在随后的时间里周期性的汇报信息给namenode。datanode在其生命周期内不断的像namenode发起请求,namenode不会主动联系datanode分配任务,二是通过心跳回复机制给datanode发送任务。datanode维护一个server socket以便处理client和其他datanode的请求,datanode会将这些host/port信息汇报给namenode。client和其他datanode从namenode获取该datanode的服务信息。

 

找到main方法:

public static void main(String args[]) {
  if (DFSUtil.parseHelpArgument(args, DataNode.USAGE, System.out, true)) {
    System.exit(0);
  }

  secureMain(args, null);
}

进入secureMain(args, null);方法:

public static void secureMain(String args[], SecureResources resources) {
  int errorCode = 0;
  try {
    StringUtils.startupShutdownMessage(DataNode.class, args, LOG);
    //创建datanode实例
    DataNode datanode = createDataNode(args, null, resources);
    if (datanode != null) {
      //join各种线程 等待结束
      datanode.join();
    } else {
      errorCode = 1;
    }
  } catch (Throwable e) {
    LOG.error("Exception in secureMain", e);
    terminate(1, e);
  } finally {
    // We need to terminate the process here because either shutdown was called
    // or some disk related conditions like volumes tolerated or volumes required
    // condition was not met. Also, In secure mode, control will go to Jsvc
    // and Datanode process hangs if it does not exit.
    LOG.warn("Exiting Datanode");
    terminate(errorCode);
  }
}

最重要的当然是DataNode datanode = createDataNode(args, null, resources);这个方法了,跟进方法:

public static DataNode createDataNode(String args[], Configuration conf,
    SecureResources resources) throws IOException {
  //初始化datanode
  DataNode dn = instantiateDataNode(args, conf, resources);
  if (dn != null) {
    //启动datanode
    dn.runDatanodeDaemon();
  }
  return dn;
}

这里显然两个重要步骤:

1.初始化datanode。

2.启动datanode。

先看第一步,初始化datanode

public static DataNode instantiateDataNode(String args [], Configuration conf,
    SecureResources resources) throws IOException {
  //加载datanode配置。
  if (conf == null)
    conf = new HdfsConfiguration();
  
  if (args != null) {
    // parse generic hadoop options
    GenericOptionsParser hParser = new GenericOptionsParser(conf, args);
    args = hParser.getRemainingArgs();
  }
  
  if (!parseArguments(args, conf)) {
    printUsage(System.err);
    return null;
  }
  //datanode数据存储的所有目录
  Collection<StorageLocation> dataLocations = getStorageLocations(conf);
  UserGroupInformation.setConfiguration(conf);
  SecurityUtil.login(conf, DFS_DATANODE_KEYTAB_FILE_KEY,
      DFS_DATANODE_KERBEROS_PRINCIPAL_KEY, getHostName(conf));
  return makeInstance(dataLocations, conf, resources);
}

进入最后一句,return makeInstance(dataLocations, conf, resources);方法:

static DataNode makeInstance(Collection<StorageLocation> dataDirs,
    Configuration conf, SecureResources resources) throws IOException {
  List<StorageLocation> locations;
  //初始化一些检查的配置参数,比如定义能够容忍多少失败的数量而不报严重错误
  StorageLocationChecker storageLocationChecker =
      new StorageLocationChecker(conf, new Timer());
  try {
    //节点检查,利用配置的阈值,来对数据目录做检查,返回健康的dir
    locations = storageLocationChecker.check(conf, dataDirs);
  } catch (InterruptedException ie) {
    //检查失败则启动失败 抛出异常
    throw new IOException("Failed to instantiate DataNode", ie);
  }
  //初始化metrix系统
  DefaultMetricsSystem.initialize("DataNode");

  assert locations.size() > 0 : "number of data directories should be > 0";
  //初始化构造datanode对象
  return new DataNode(conf, locations, storageLocationChecker, resources);
}

这段主要先做一些datanode目录的检查,失败则直接启动失败,检查通过则接着进入最后一句,构造datanode对象:return new DataNode(conf, locations, storageLocationChecker, resources);

/**
 * Create the DataNode given a configuration, an array of dataDirs,
 * and a namenode proxy.
 */
DataNode(final Configuration conf,
         final List<StorageLocation> dataDirs,
         final StorageLocationChecker storageLocationChecker,
         final SecureResources resources) throws IOException {
  //初始化配置,创建configuration对象,将配置文件读入到内存
  super(conf);
  this.tracer = createTracer(conf);
  this.tracerConfigurationManager =
      new TracerConfigurationManager(DATANODE_HTRACE_PREFIX, conf);


  this.fileIoProvider = new FileIoProvider(conf, this);
  this.blockScanner = new BlockScanner(this);
  this.lastDiskErrorCheck = 0;
  this.maxNumberOfBlocksToLog = conf.getLong(DFS_MAX_NUM_BLOCKS_TO_LOG_KEY,
      DFS_MAX_NUM_BLOCKS_TO_LOG_DEFAULT);

  this.usersWithLocalPathAccess = Arrays.asList(
      conf.getTrimmedStrings(DFSConfigKeys.DFS_BLOCK_LOCAL_PATH_ACCESS_USER_KEY));
  this.connectToDnViaHostname = conf.getBoolean(
      DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME,
      DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME_DEFAULT);
  this.supergroup = conf.get(DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_KEY,
      DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_DEFAULT);
  this.isPermissionEnabled = conf.getBoolean(
      DFSConfigKeys.DFS_PERMISSIONS_ENABLED_KEY,
      DFSConfigKeys.DFS_PERMISSIONS_ENABLED_DEFAULT);
  this.pipelineSupportECN = conf.getBoolean(
      DFSConfigKeys.DFS_PIPELINE_ECN_ENABLED,
      DFSConfigKeys.DFS_PIPELINE_ECN_ENABLED_DEFAULT);

  confVersion = "core-" +
      conf.get("hadoop.common.configuration.version", "UNSPECIFIED") +
      ",hdfs-" +
      conf.get("hadoop.hdfs.configuration.version", "UNSPECIFIED");

  this.volumeChecker = new DatasetVolumeChecker(conf, new Timer());
  //初始化xferService线程池
  this.xferService =
      HadoopExecutors.newCachedThreadPool(new Daemon.DaemonFactory());

  // Determine whether we should try to pass file descriptors to clients.
  if (conf.getBoolean(HdfsClientConfigKeys.Read.ShortCircuit.KEY,
            HdfsClientConfigKeys.Read.ShortCircuit.DEFAULT)) {
    String reason = DomainSocket.getLoadingFailureReason();
    if (reason != null) {
      LOG.warn("File descriptor passing is disabled because {}", reason);
      this.fileDescriptorPassingDisabledReason = reason;
    } else {
      LOG.info("File descriptor passing is enabled.");
      this.fileDescriptorPassingDisabledReason = null;
    }
  } else {
    this.fileDescriptorPassingDisabledReason =
        "File descriptor passing was not configured.";
    LOG.debug(this.fileDescriptorPassingDisabledReason);
  }

  this.socketFactory = NetUtils.getDefaultSocketFactory(conf);

  try {
    hostName = getHostName(conf);
    LOG.info("Configured hostname is {}", hostName);
    //初始或开启datanode多个组件
    startDataNode(dataDirs, resources);
  } catch (IOException ie) {
    shutdown();
    throw ie;
  }
  final int dncCacheMaxSize =
      conf.getInt(DFS_DATANODE_NETWORK_COUNTS_CACHE_MAX_SIZE_KEY,
          DFS_DATANODE_NETWORK_COUNTS_CACHE_MAX_SIZE_DEFAULT) ;
  datanodeNetworkCounts =
      CacheBuilder.newBuilder()
          .maximumSize(dncCacheMaxSize)
          .build(new CacheLoader<String, Map<String, Long>>() {
            @Override
            public Map<String, Long> load(String key) throws Exception {
              final Map<String, Long> ret = new HashMap<String, Long>();
              ret.put("networkErrors", 0L);
              return ret;
            }
          });

  initOOBTimeout();
  this.storageLocationChecker = storageLocationChecker;
}

这里最重要的是方法:startDataNode(dataDirs, resources);

void startDataNode(List<StorageLocation> dataDirectories,
                   SecureResources resources
                   ) throws IOException {

  // settings global for all BPs in the Data Node
  this.secureResources = resources;
  synchronized (this) {
    this.dataDirs = dataDirectories;
  }
  this.dnConf = new DNConf(this);
  checkSecureConfig(dnConf, getConf(), resources);

  if (dnConf.maxLockedMemory > 0) {
    if (!NativeIO.POSIX.getCacheManipulator().verifyCanMlock()) {
      throw new RuntimeException(String.format(
          "Cannot start datanode because the configured max locked memory" +
          " size (%s) is greater than zero and native code is not available.",
          DFS_DATANODE_MAX_LOCKED_MEMORY_KEY));
    }
    if (Path.WINDOWS) {
      NativeIO.Windows.extendWorkingSetSize(dnConf.maxLockedMemory);
    } else {
      long ulimit = NativeIO.POSIX.getCacheManipulator().getMemlockLimit();
      if (dnConf.maxLockedMemory > ulimit) {
        throw new RuntimeException(String.format(
          "Cannot start datanode because the configured max locked memory" +
          " size (%s) of %d bytes is more than the datanode's available" +
          " RLIMIT_MEMLOCK ulimit of %d bytes.",
          DFS_DATANODE_MAX_LOCKED_MEMORY_KEY,
          dnConf.maxLockedMemory,
          ulimit));
      }
    }
  }
  LOG.info("Starting DataNode with maxLockedMemory = {}",
      dnConf.maxLockedMemory);

  int volFailuresTolerated = dnConf.getVolFailuresTolerated();
  int volsConfigured = dnConf.getVolsConfigured();
  if (volFailuresTolerated < MAX_VOLUME_FAILURE_TOLERATED_LIMIT
      || volFailuresTolerated >= volsConfigured) {
    throw new HadoopIllegalArgumentException("Invalid value configured for "
        + "dfs.datanode.failed.volumes.tolerated - " + volFailuresTolerated
        + ". Value configured is either less than -1 or >= "
        + "to the number of configured volumes (" + volsConfigured + ").");
  }
  //构建DataStorage
  storage = new DataStorage();
  
  // global DN settings
  registerMXBean();
  //初始化DataXceiver
  initDataXceiver();
  //开启InfoServer
  startInfoServer();
  pauseMonitor = new JvmPauseMonitor();
  pauseMonitor.init(getConf());
  pauseMonitor.start();

  // BlockPoolTokenSecretManager is required to create ipc server.
  this.blockPoolTokenSecretManager = new BlockPoolTokenSecretManager();

  // Login is done by now. Set the DN user name.
  dnUserName = UserGroupInformation.getCurrentUser().getUserName();
  LOG.info("dnUserName = {}", dnUserName);
  LOG.info("supergroup = {}", supergroup);
  //初始化IPCSERVER
  initIpcServer();

  metrics = DataNodeMetrics.create(getConf(), getDisplayName());
  peerMetrics = dnConf.peerStatsEnabled ?
      DataNodePeerMetrics.create(getDisplayName(), getConf()) : null;
  metrics.getJvmMetrics().setPauseMonitor(pauseMonitor);

  ecWorker = new ErasureCodingWorker(getConf(), this);
  blockRecoveryWorker = new BlockRecoveryWorker(this);
  //初始化BlockPoolManager
  blockPoolManager = new BlockPoolManager(this);
  blockPoolManager.refreshNamenodes(getConf());

  // Create the ReadaheadPool from the DataNode context so we can
  // exit without having to explicitly shutdown its thread pool.
  readaheadPool = ReadaheadPool.getInstance();
  saslClient = new SaslDataTransferClient(dnConf.getConf(),
      dnConf.saslPropsResolver, dnConf.trustedChannelResolver);
  saslServer = new SaslDataTransferServer(dnConf, blockPoolTokenSecretManager);
  startMetricsLogger();

  if (dnConf.diskStatsEnabled) {
    diskMetrics = new DataNodeDiskMetrics(this,
        dnConf.outliersReportIntervalMs);
  }
}

startDataNode方法有多个重要的操作:

1.构建DataStorage:storage = new DataStorage();

2.初始化DataXceiver:initDataXceiver();

3.开启InfoServer:startInfoServer();

4.初始化IPCSERVER:initIpcServer();

5.启动与namenode通信线程bpofferService:blockPoolManager.refreshNamenodes(conf);

1.storage = new DataStorage();

DataStorage() {
  super(NodeType.DATA_NODE);
  trashEnabledBpids = Collections.newSetFromMap(
      new ConcurrentHashMap<String, Boolean>());
}

后面通过一些重载方法,初始化一些变量:

public StorageInfo(int layoutV, int nsID, String cid, long cT, NodeType type) {
  layoutVersion = layoutV;
  clusterID = cid;
  namespaceID = nsID;
  cTime = cT;
  storageType = type;
}

2.initDataXceiver()

private void initDataXceiver(Configuration conf) throws IOException {
    // find free port or use privileged port provided
	// 找一个自由端口或使用已提供的特权端
	// 构造TcpPeerServer实例tcpPeerServer,它实现了PeerServer接口,提供了ServerSocket的功能
    TcpPeerServer tcpPeerServer;
    
    if (secureResources != null) {// 如果secureResources存在,根据secureResources创建tcpPeerServer
      tcpPeerServer = new TcpPeerServer(secureResources);
    } else {// 否则,根据配置信息创建tcpPeerServer
      tcpPeerServer = new TcpPeerServer(dnConf.socketWriteTimeout,
          DataNode.getStreamingAddr(conf));
    }
    
    // 设置数据接收缓冲区大小,默认为128KB
    tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE);
    
    // 获取Socket地址InetSocketAddress,赋值给DataNode成员变量streamingAddr
    streamingAddr = tcpPeerServer.getStreamingAddr();
    LOG.info("Opened streaming server at " + streamingAddr);
    
    // 构造名字为dataXceiverServer的线程组threadGroup
    this.threadGroup = new ThreadGroup("dataXceiverServer");
    
    // 构造DataXceiverServer实例xserver,传入tcpPeerServer
    xserver = new DataXceiverServer(tcpPeerServer, conf, this);
    
    // 构造dataXceiverServer守护线程,并将xserver加入线程组threadGroup
    this.dataXceiverServer = new Daemon(threadGroup, xserver);
    
    // 将线程组里的所有线程设置为设置为守护线程,方便虚拟机退出时自动销毁
    this.threadGroup.setDaemon(true); // auto destroy when empty
 
    // 如果系统配置的参数dfs.client.read.shortcircuit为true(默认为false),
    // 或者配置的参数dfs.client.domain.socket.data.traffic为true(默认为false),
    // 
    if (conf.getBoolean(DFSConfigKeys.DFS_CLIENT_READ_SHORTCIRCUIT_KEY,
              DFSConfigKeys.DFS_CLIENT_READ_SHORTCIRCUIT_DEFAULT) ||
        conf.getBoolean(DFSConfigKeys.DFS_CLIENT_DOMAIN_SOCKET_DATA_TRAFFIC,
              DFSConfigKeys.DFS_CLIENT_DOMAIN_SOCKET_DATA_TRAFFIC_DEFAULT)) {
      DomainPeerServer domainPeerServer =
                getDomainPeerServer(conf, streamingAddr.getPort());
      if (domainPeerServer != null) {
        this.localDataXceiverServer = new Daemon(threadGroup,
            new DataXceiverServer(domainPeerServer, conf, this));
        LOG.info("Listening on UNIX domain socket: " +
            domainPeerServer.getBindPath());
      }
    }
    
    // 构造短路注册实例
    this.shortCircuitRegistry = new ShortCircuitRegistry(conf);
  }

DataXceiverServer是数据节点DataNode上一个用于接收数据读写请求的后台工作线程,为每个数据读写请求创建一个单独的线程去处理。

/**
 * Server used for receiving/sending a block of data.
 * This is created to listen for requests from clients or 
 * other DataNodes.  This small server does not use the 
 * Hadoop IPC mechanism.
 */
class DataXceiverServer implements Runnable 

DataXceiverServer是个线程,首先看它的成员变量:

// PeerServer是一个接口,实现了它的TcpPeerServer封装了一个ServerSocket,提供了Java Socket服务端的功能
  private final PeerServer peerServer;
  // 该DataXceiverServer所属DataNode实例datanode
  private final DataNode datanode;
  // Peer所在线程的映射集合peers
  private final HashMap<Peer, Thread> peers = new HashMap<Peer, Thread>();  
  // Peer与DataXceiver的映射集合peersXceiver
  private final HashMap<Peer, DataXceiver> peersXceiver = new HashMap<Peer, DataXceiver>(); 
  // DataXceiverServer是否已关闭的标志位closed
  private boolean closed = false;
  /**
   * Maximal number of concurrent xceivers per node.
   * Enforcing the limit is required in order to avoid data-node
   * running out of memory.
   * 
   * 每个节点并行的最大DataXceivers数目。
   * 为了避免dataNode运行内存溢出,执行这个限制是必须的。
   * 定义是默认值为4096.
   */
  int maxXceiverCount =
    DFSConfigKeys.DFS_DATANODE_MAX_RECEIVER_THREADS_DEFAULT;
  // 集群数据块平衡节流器balanceThrottler
  final BlockBalanceThrottler balanceThrottler;
  /**
   * 我们需要估计块大小以检测磁盘分区是否有足够的空间。
   * 新客户端传递预期块大小给DataNode。
   * 对于旧客户端而言我们仅仅使用服务器端默认的块大小。
   */
  final long estimateBlockSize;

PeerServer类型的peerServer,实际上是DataXceiverServer实现功能最重要的一个类,在DataXceiverServer实例构造时,实际上传入的是实现了PeerServer接口的TcpPeerServer类,该类内部封装了一个ServerSocket,提供了Java Socket服务端的功能,用于监听来自客户端或其他DataNode的数据读写请求。DataXceiverServer内部还存在对于其载体DataNode的实例datanode,这样该线程就能随时获得DataNode状态、提供的一些列服务等; peers和peersXceiver是DataXceiverServer内部关于peer的两个数据结构,一个是Peer与其所在线程映射集合peers,另一个则是Peer与DataXceiver的映射集合peersXceiver,均是HashMap类型。Peer是什么呢?实际上就是对Socket的封装; closed为DataXceiverServer是否已关闭的标志位; maxXceiverCount为每个DataNode节点并行的最大DataXceivers数目,为了避免dataNode运行内存溢出,执行这个限制是必须的;balanceThrottler是DataXceiverServer内部一个关于集群中数据库平衡的节流器的实现,它实现了对于数据块移动时带宽、数量的控制。

再来看看他的构造方法:

DataXceiverServer(PeerServer peerServer, Configuration conf,
    DataNode datanode) {
  
  this.peerServer = peerServer;
  this.datanode = datanode;
  // 设置DataNode中DataXceiver的最大数目maxXceiverCount
  // 取参数dfs.datanode.max.transfer.threads,参数未配置的话,默认值为4096
  this.maxXceiverCount = 
    conf.getInt(DFSConfigKeys.DFS_DATANODE_MAX_RECEIVER_THREADS_KEY,
                DFSConfigKeys.DFS_DATANODE_MAX_RECEIVER_THREADS_DEFAULT);
  //估计块大小:128*1024*1024 默认128M
  this.estimateBlockSize = conf.getLongBytes(DFSConfigKeys.DFS_BLOCK_SIZE_KEY,
      DFSConfigKeys.DFS_BLOCK_SIZE_DEFAULT);
  
  //set up parameter for cluster balancing
  // 设置集群平衡节流器
  // 带宽取参数dfs.datanode.balance.bandwidthPerSec,参数未配置默认为1024*1024
  // 最大线程数取参数dfs.datanode.balance.max.concurrent.moves,参数未配置默认为5
  this.balanceThrottler = new BlockBalanceThrottler(
      conf.getLong(DFSConfigKeys.DFS_DATANODE_BALANCE_BANDWIDTHPERSEC_KEY,
          DFSConfigKeys.DFS_DATANODE_BALANCE_BANDWIDTHPERSEC_DEFAULT),
      conf.getInt(DFSConfigKeys.DFS_DATANODE_BALANCE_MAX_NUM_CONCURRENT_MOVES_KEY,
          DFSConfigKeys.DFS_DATANODE_BALANCE_MAX_NUM_CONCURRENT_MOVES_DEFAULT));
}

上面可以看到,DataXceiverServer实现了Runnable接口,那么它真正执行的代码就在run方法里:

@Override
public void run() {
  Peer peer = null;
  //shouldRun:datanode运行状态标志位,volatile修饰保证可见性
  //shutdownForUpgrade:标志位,datanodeshutdown的时候会被标记为false
  while (datanode.shouldRun && !datanode.shutdownForUpgrade) {
    try {
      //阻塞方法,等待客户端或者其他datanode的连接请求
      peer = peerServer.accept();

      // Make sure the xceiver count is not exceeded
      //获取线程组里的线程个数,不能超过参数dfs.datanode.max.transfer.threads的值,默认4096
      int curXceiverCount = datanode.getXceiverCount();
      if (curXceiverCount > maxXceiverCount) {
        throw new IOException("Xceiver count " + curXceiverCount
            + " exceeds the limit of concurrent xcievers: "
            + maxXceiverCount);
      }
      //启动一个DataXceiver线程
      new Daemon(datanode.threadGroup,
          DataXceiver.create(peer, datanode, this))
          .start();
    } catch (SocketTimeoutException ignored) {
      // wake up to see if should continue to run
    } catch (AsynchronousCloseException ace) {
      // another thread closed our listener socket - that's expected during shutdown,
      // but not in other circumstances
      if (datanode.shouldRun && !datanode.shutdownForUpgrade) {
        LOG.warn(datanode.getDisplayName() + ":DataXceiverServer: ", ace);
      }
    } catch (IOException ie) {
      IOUtils.cleanup(null, peer);
      LOG.warn(datanode.getDisplayName() + ":DataXceiverServer: ", ie);
    } catch (OutOfMemoryError ie) {
      IOUtils.cleanup(null, peer);
      // DataNode can run out of memory if there is too many transfers.
      // Log the event, Sleep for 30 seconds, other transfers may complete by
      // then.
      // 数据节点可能由于存在太多的数据传输导致内存溢出,记录该事件,并等待30秒,其他的数据传输可能到时就完成了
      LOG.warn("DataNode is out of memory. Will retry in 30 seconds.", ie);
      try {
        Thread.sleep(30 * 1000);
      } catch (InterruptedException e) {
        // ignore
      }
    } catch (Throwable te) {
      LOG.error(datanode.getDisplayName()
          + ":DataXceiverServer: Exiting due to: ", te);
      datanode.shouldRun = false;
    }
  }

  // Close the server to stop reception of more requests.
  try {
    peerServer.close();
    closed = true;
  } catch (IOException ie) {
    LOG.warn(datanode.getDisplayName()
        + " :DataXceiverServer: close exception", ie);
  }

  // if in restart prep stage, notify peers before closing them.
  if (datanode.shutdownForUpgrade) {
    restartNotifyPeers();
    // Each thread needs some time to process it. If a thread needs
    // to send an OOB message to the client, but blocked on network for
    // long time, we need to force its termination.
    LOG.info("Shutting down DataXceiverServer before restart");
    // Allow roughly up to 2 seconds.
    for (int i = 0; getNumPeers() > 0 && i < 10; i++) {
      try {
        Thread.sleep(200);
      } catch (InterruptedException e) {
        // ignore
      }
    }
  }
  // Close all peers.
  closeAllPeers();
}

这里大概意思就是启动一个阻塞的服务,来等待请求的连接,一旦接收到客户端的请求,就会启动一个DataXceiver线程来处理该请求。

 

3.startInfoServer

这里主要是构造一个HttpServer2服务,然后将对应的servlet映射到请求路径,这里不是特别重要,就不详细分析了

 

4.初始化IPCSERVER:initIpcServer();

private void initIpcServer(Configuration conf) throws IOException {
  InetSocketAddress ipcAddr = NetUtils.createSocketAddr(
      conf.get(DFS_DATANODE_IPC_ADDRESS_KEY));
  
  // Add all the RPC protocols that the Datanode implements    
  RPC.setProtocolEngine(conf, ClientDatanodeProtocolPB.class,
      ProtobufRpcEngine.class);
  ClientDatanodeProtocolServerSideTranslatorPB clientDatanodeProtocolXlator = 
        new ClientDatanodeProtocolServerSideTranslatorPB(this);
  BlockingService service = ClientDatanodeProtocolService
      .newReflectiveBlockingService(clientDatanodeProtocolXlator);
  ipcServer = new RPC.Builder(conf)
      .setProtocol(ClientDatanodeProtocolPB.class)
      .setInstance(service)
      .setBindAddress(ipcAddr.getHostName())
      .setPort(ipcAddr.getPort())
      .setNumHandlers(
          conf.getInt(DFS_DATANODE_HANDLER_COUNT_KEY,
              DFS_DATANODE_HANDLER_COUNT_DEFAULT)).setVerbose(false)
      .setSecretManager(blockPoolTokenSecretManager).build();
  
  InterDatanodeProtocolServerSideTranslatorPB interDatanodeProtocolXlator = 
      new InterDatanodeProtocolServerSideTranslatorPB(this);
  service = InterDatanodeProtocolService
      .newReflectiveBlockingService(interDatanodeProtocolXlator);
  DFSUtil.addPBProtocol(conf, InterDatanodeProtocolPB.class, service,
      ipcServer);

  TraceAdminProtocolServerSideTranslatorPB traceAdminXlator =
      new TraceAdminProtocolServerSideTranslatorPB(this);
  BlockingService traceAdminService = TraceAdminService
      .newReflectiveBlockingService(traceAdminXlator);
  DFSUtil.addPBProtocol(conf, TraceAdminProtocolPB.class, traceAdminService,
      ipcServer);

  LOG.info("Opened IPC server at " + ipcServer.getListenerAddress());

  // set service-level authorization security policy
  if (conf.getBoolean(
      CommonConfigurationKeys.HADOOP_SECURITY_AUTHORIZATION, false)) {
    ipcServer.refreshServiceAcl(conf, new HDFSPolicyProvider());
  }
}

与namenode类似,datanode也需要实现一些rpc的功能。这里主要是初始化两个RPC协议:ClientDatanodeProtocolPB和InterDatanodeProtocolPB;从名字上可以看出,分别是实现client端与datanode通信和datanode之间通信的协议。

 

5.blockPoolManager.refreshNamenodes(conf);

这里调用的是BlockPoolManager类的refresh方法,首先看BlockPoolManager的注释:

管理datanode中的BPOfferService对象,对于BPOfferService对象的创建,删除,开启,停止,都需要经过该类来操作。

也就是说blockPoolManager主要是在管理BPOfferService对象,这里提到了BPOfferService对象,先来看看该对象的注释:

在每个datanode上都会有一个或多个BPOfferService实例,该BPOfferService实例管理着该datanode需要发送心跳的active和standby的namenode,BPOfferService内部持有BPServiceActor实例对象,BPServiceActor是个线程,每个BPServiceActor对应着一个namenode,该namenode或者是active或者是standby状态,BPOfferService也维护着namenode的主备状态的切换。

上面说的datanode中的多个BPOfferService是指在联邦体系中,每一对namenode(standby和active)会对应一个BPOFFERSERVICE。大概了解了BlockPoolManager类和BPOfferService对象后,回过头来看refresh方法:

void refreshNamenodes(Configuration conf)
    throws IOException {
  LOG.info("Refresh request received for nameservices: " + conf.get
          (DFSConfigKeys.DFS_NAMESERVICES));

  Map<String, Map<String, InetSocketAddress>> newAddressMap = DFSUtil
          .getNNServiceRpcAddressesForCluster(conf);

  synchronized (refreshNamenodesLock) {
    doRefreshNamenodes(newAddressMap);
  }
}

对于Map<String, Map<String, InetSocketAddress>> newAddressMap = DFSUtil .getNNServiceRpcAddressesForCluster(conf);

该代码用于获取datanode对应的namenode的地址的一个数据结构,获取过程比较繁琐,大意是从各种配置中获取这些地址,最后的结果Map<String, Map<String, InetSocketAddress>> 中的结构是:

Map<namespaceid, Map<namenodeid, namode的端口地址信息>> ,获取了该信息后将其传给doRefreshNamenodes方法:

private void doRefreshNamenodes(
    Map<String, Map<String, InetSocketAddress>> addrMap) throws IOException {
  //首先判断该线程拥有这把锁
  assert Thread.holdsLock(refreshNamenodesLock);
  //初始化三个set,分别存放需要刷新,添加,删除的
  Set<String> toRefresh = Sets.newLinkedHashSet();
  Set<String> toAdd = Sets.newLinkedHashSet();
  Set<String> toRemove;
  
  synchronized (this) {
    // Step 1. For each of the new nameservices, figure out whether
    // it's an update of the set of NNs for an existing NS,
    // or an entirely new nameservice.
    //对于配置文件里有的nameservice,并且bpByNameserviceId里面也有的bpByNameservice,就放入刷新列表里
    for (String nameserviceId : addrMap.keySet()) {
      if (bpByNameserviceId.containsKey(nameserviceId)) {
        toRefresh.add(nameserviceId);
      } else {
        toAdd.add(nameserviceId);
      }
    }
    
    // Step 2. Any nameservices we currently have but are no longer present
    // need to be removed.
    //删除bpByNameserviceId中存在,而配置信息addrMap中没有的
    toRemove = Sets.newHashSet(Sets.difference(
        bpByNameserviceId.keySet(), addrMap.keySet()));
    
    assert toRefresh.size() + toAdd.size() ==
      addrMap.size() :
        "toAdd: " + Joiner.on(",").useForNull("<default>").join(toAdd) +
        "  toRemove: " + Joiner.on(",").useForNull("<default>").join(toRemove) +
        "  toRefresh: " + Joiner.on(",").useForNull("<default>").join(toRefresh);

    
    // Step 3. Start new nameservices
    if (!toAdd.isEmpty()) {
      LOG.info("Starting BPOfferServices for nameservices: " +
          Joiner.on(",").useForNull("<default>").join(toAdd));
      //对于每一对namenode,就创建一个BPOfferService对象,并将该对象加入到bpByNameserviceId这个map数据结构中
      for (String nsToAdd : toAdd) {
        ArrayList<InetSocketAddress> addrs =
          Lists.newArrayList(addrMap.get(nsToAdd).values());
        BPOfferService bpos = createBPOS(addrs);
        bpByNameserviceId.put(nsToAdd, bpos);
        //offerServices存储了该datanode对应的所有BPOfferService对象
        offerServices.add(bpos);
      }
    }
    //开启所有BPOfferService里的BPServiceActor对线程
    startAll();
  }

  // Step 4. Shut down old nameservices. This happens outside
  // of the synchronized(this) lock since they need to call
  // back to .remove() from another thread
  //删除所有老的nameservices
  if (!toRemove.isEmpty()) {
    LOG.info("Stopping BPOfferServices for nameservices: " +
        Joiner.on(",").useForNull("<default>").join(toRemove));
    
    for (String nsToRemove : toRemove) {
      BPOfferService bpos = bpByNameserviceId.get(nsToRemove);
      bpos.stop();
      bpos.join();
      // they will call remove on their own
    }
  }
  
  // Step 5. Update nameservices whose NN list has changed
  //当namenode列表发生变化时,更新nameservices
  if (!toRefresh.isEmpty()) {
    LOG.info("Refreshing list of NNs for nameservices: " +
        Joiner.on(",").useForNull("<default>").join(toRefresh));
    
    for (String nsToRefresh : toRefresh) {
      BPOfferService bpos = bpByNameserviceId.get(nsToRefresh);
      ArrayList<InetSocketAddress> addrs =
        Lists.newArrayList(addrMap.get(nsToRefresh).values());
      bpos.refreshNNList(addrs);
    }
  }
}

该方法主要是对blockpoolmanager中的记录的namenode对应的一些内存中的信息做更新,记录该dn对应的所有namenode对,启动BpserviceActor线程与每一个namenode通信。

到此startDataNode方法执行完成。回到createDataNode(String args[], Configuration conf,SecureResources resources) 方法,该方法还有个dn.runDatanodeDaemon();需要执行:

public void runDatanodeDaemon() throws IOException {
  blockPoolManager.startAll();

  // start dataXceiveServer
  dataXceiverServer.start();
  if (localDataXceiverServer != null) {
    localDataXceiverServer.start();
  }
  ipcServer.start();
  startPlugins(conf);
}

首先,blockPoolManager.startAll();方法,该方法最终调用的是上面的BpserviceActor线程,实际上,这些线程在上面的步骤中已经启动,这里实际是多余的。

再来看dataXceiverServer.start();方法,这个方法上面也大概分析过,用来接收客户端或者其他datanode的数据传输请求,上面已经做了初始化,这里启动一下。最后IPCserver也是上面的步骤初始化完成后,这里只是做了一个启动操作。

到此,datanode的启动基本完成。总结一下:

 

main方法:

    ->secureMain(args, null);

        ->createDataNode(args, null, resources);

            ->instantiateDataNode(args, conf, resources);初始化一些配置及线程

                ->makeInstance(dataLocations, conf, resources);

                        ->new DataNode(conf, locations, resources);

                            ->startDataNode(conf, dataDirs, resources);

                                ->storage = new DataStorage();

                                ->registerMXBean();

                                ->initDataXceiver(conf);

                                ->startInfoServer(conf);

                                ->pauseMonitor.start();

                                ->blockPoolManager.refreshNamenodes(conf);

                                    ->doRefreshNamenodes

                                        ->启动BpserviceActor线程

   ->dn.runDatanodeDaemon();启动dataXceiverServer,IPCserver

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值