2021-11-28 DataNode启动

最新推荐文章于 2024-08-17 22:31:50 发布

知识的游牧民族

最新推荐文章于 2024-08-17 22:31:50 发布

阅读量2k

点赞数

分类专栏： hadoop3 文章标签： kafka big data java

本文链接：https://blog.csdn.net/cn987654/article/details/121597209

版权

hadoop3 专栏收录该内容

17 篇文章 3 订阅

订阅专栏

源码基于hadoop-3.3.0

1 概述

DataNode类封装了整个数据节点逻辑的实现。它通过DataStorage以及FsDatasetImpl管理着数据节点存储上的所有数据块，DataNode类还会通过流式接口对客户端和其他数据节点提供读数据块、写数据块、复制数据块等功能。同时DataNode类实现了InterDatanodeProtocol以及ClientDatanodeProtocol，使得数据节点可以接收来自其他数据节点以及客户端的远程RPC请求。DataNode类还会通过BlockPoolManager 对象周期性地向Namenode发送心跳、块汇报、增量块汇报以及缓存汇报，同时执行Namenode发回的名字节点指令。 DataNode持有DataBlockScanner对象周期性地检查存储上的所有数据块，以及DirectoryScanner对象验证存储上数据块和内存中数据块的一致性。

一个集群可能包含上千个DataNode节点，这些DataNode定时和NameNode进行通信，即心跳。

datanode启动时，每个datanode对本地磁盘进行扫描，将本datanode上保存的block信息汇报给namenode，namenode在接收到的block信息以及该block所在的datanode信息等保存在内存中。
DataNode启动后向NameNode注册，通过后周期性（1小时）的向NameNode上报所有的块信息。

而后，通过向NameNode发送心跳保持与其联系（3秒一次），心跳返回结果带有NN的命令
返回的命令为：如块的复制，删除某个数据块…..
如果10分钟没有收到DataNode的心跳，则认为其已经lost，并copy其上的block到其它DataNode。

2 datanode的启动

DataNode的启动是通过main()方法作为入口的。

public static void main(String args[]) {
    if (DFSUtil.parseHelpArgument(args, DataNode.USAGE, System.out, true)) {
        System.exit(0);
    }

    secureMain(args, null);
}

public static void secureMain(String args[], SecureResources resources) {
    int errorCode = 0;
    try {
        StringUtils.startupShutdownMessage(DataNode.class, args, LOG);
        // 创建DN节点
        DataNode datanode = createDataNode(args, null, resources);
        if (datanode != null) {
            // 阻塞DN
            datanode.join();
        } else {
            errorCode = 1;
        }
    } catch (Throwable e) {
        LOG.error("Exception in secureMain", e);
        terminate(1, e);
    } finally {
        // We need to terminate the process here because either shutdown was called
        // or some disk related conditions like volumes tolerated or volumes required
        // condition was not met. Also, In secure mode, control will go to Jsvc
        // and Datanode process hangs if it does not exit.
        LOG.warn("Exiting Datanode");
        terminate(errorCode);
    }
}

createDataNode是首先调用静态方法instantiateDataNode()创建DataNode实例，然后调用runDatanodeDaemon()方法启动DataNode上的各个服务

/** 
 * Instantiate &amp; Start a single datanode daemon and wait for it to
 * finish.
 *  If this thread is specifically interrupted, it will stop waiting.
 */
@VisibleForTesting
@InterfaceAudience.Private
public static DataNode createDataNode(String args[], Configuration conf,
                                        SecureResources resources) throws IOException {
    // 实例化数据节点
    DataNode dn = instantiateDataNode(args, conf, resources);
    if (dn != null) {
        // 启动dn上的服务
        dn.runDatanodeDaemon();
    }
    return dn;
}
/** 
 * Instantiate a single datanode object, along with its secure resources. 
 * This must be run by invoking{@link DataNode#runDatanodeDaemon()} 
 * subsequently. 
 */
public static DataNode instantiateDataNode(String args [], Configuration conf,
                                           SecureResources resources) throws IOException {
    if (conf == null)
        conf = new HdfsConfiguration();

    if (args != null) {
        // parse generic hadoop options
        GenericOptionsParser hParser = new GenericOptionsParser(conf, args);
        args = hParser.getRemainingArgs();
    }

    if (!parseArguments(args, conf)) {
        printUsage(System.err);
        return null;
    }
    
    // 构建存储位置对象
    Collection<StorageLocation> dataLocations = getStorageLocations(conf);
    
    // 构建用户组信息
    UserGroupInformation.setConfiguration(conf);
    
    // 构建授权体系 比如 kerbos
    SecurityUtil.login(conf, DFS_DATANODE_KEYTAB_FILE_KEY,
                       DFS_DATANODE_KERBEROS_PRINCIPAL_KEY, getHostName(conf));
    return makeInstance(dataLocations, conf, resources);
}

静态方法instantiateDataNode()首先解析DataNode的启动参数，获取DataNode配置文件中定义的所有存储目录，然后调用静态方法makeInstance()。 makeInstance()方法确保定义的存储目录至少有一个可用，然后调用DataNode的构造方法创建DataNode实例。

/**
   * Make an instance of DataNode after ensuring that at least one of the
   * given data directories (and their parent directories, if necessary)
   * can be created.
   * @param dataDirs List of directories, where the new DataNode instance should
   * keep its files.
   * @param conf Configuration instance to use.
   * @param resources Secure resources needed to run under Kerberos
   * @return DataNode instance for given list of data dirs and conf, or null if
   * no directory from this directory list can be created.
   * @throws IOException
   */
static DataNode makeInstance(Collection<StorageLocation> dataDirs,
                             Configuration conf, SecureResources resources) throws IOException {
    List<StorageLocation> locations;
    StorageLocationChecker storageLocationChecker =
        new StorageLocationChecker(conf, new Timer());
    try {
        locations = storageLocationChecker.check(conf, dataDirs);
    } catch (InterruptedException ie) {
        throw new IOException("Failed to instantiate DataNode", ie);
    }
    DefaultMetricsSystem.initialize("DataNode");

    assert locations.size() > 0 : "number of data directories should be > 0";
    return new DataNode(conf, locations, storageLocationChecker, resources);
}

最后调用DataNode的构造函数构建数据节点实例：

DataNode的构造函数在初始化了若干配置文件中定义的参数后，调用startDataNode()方法完成DataNode的初始化操作， startDataNode()方法初始化了DataStorage对象、DataXceiverServer对象、 ShortCircuitRegistry对象，启动了HttpInfoServer，初始化了DataNode的IPC Server，然后创建BlockPoolManager并加载每个块池定义的Namenode列表。

/**
 * Create the DataNode given a configuration, an array of dataDirs,
 * and a namenode proxy.
 */
DataNode(final Configuration conf,
         final List<StorageLocation> dataDirs,
         final StorageLocationChecker storageLocationChecker,
         final SecureResources resources) throws IOException {
    super(conf);
    this.tracer = createTracer(conf);
    this.tracerConfigurationManager =
        new TracerConfigurationManager(DATANODE_HTRACE_PREFIX, conf);
    this.fileIoProvider = new FileIoProvider(conf, this);
    this.blockScanner = new BlockScanner(this);
    this.lastDiskErrorCheck = 0;
    this.maxNumberOfBlocksToLog = conf.getLong(DFS_MAX_NUM_BLOCKS_TO_LOG_KEY,
                                               DFS_MAX_NUM_BLOCKS_TO_LOG_DEFAULT);

    this.usersWithLocalPathAccess = Arrays.asList(
        conf.getTrimmedStrings(DFSConfigKeys.DFS_BLOCK_LOCAL_PATH_ACCESS_USER_KEY));
    this.connectToDnViaHostname = conf.getBoolean(
        DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME,
        DFSConfigKeys.DFS_DATANODE_USE_DN_HOSTNAME_DEFAULT);
    this.supergroup = conf.get(DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_KEY,
                               DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_DEFAULT);
    this.isPermissionEnabled = conf.getBoolean(
        DFSConfigKeys.DFS_PERMISSIONS_ENABLED_KEY,
        DFSConfigKeys.DFS_PERMISSIONS_ENABLED_DEFAULT);
    this.pipelineSupportECN = conf.getBoolean(
        DFSConfigKeys.DFS_PIPELINE_ECN_ENABLED,
        DFSConfigKeys.DFS_PIPELINE_ECN_ENABLED_DEFAULT);

    confVersion = "core-" +
        conf.get("hadoop.common.configuration.version", "UNSPECIFIED") +
        ",hdfs-" +
        conf.get("hadoop.hdfs.configuration.version", "UNSPECIFIED");

    this.volumeChecker = new DatasetVolumeChecker(conf, new Timer());
    this.xferService =
        HadoopExecutors.newCachedThreadPool(new Daemon.DaemonFactory());

    // 确定是否应尝试将文件描述符传递给客户端 : false
    // Determine whether we should try to pass file descriptors to clients.
    if (conf.getBoolean(HdfsClientConfigKeys.Read.ShortCircuit.KEY,
                        HdfsClientConfigKeys.Read.ShortCircuit.DEFAULT)) {
        String reason = DomainSocket.getLoadingFailureReason();
        if (reason != null) {
            LOG.warn("File descriptor passing is disabled because {}", reason);
            this.fileDescriptorPassingDisabledReason = reason;
        } else {
            LOG.info("File descriptor passing is enabled.");
            this.fileDescriptorPassingDisabledReason = null;
        }
    } else {
        this.fileDescriptorPassingDisabledReason =
            "File descriptor passing was not configured.";
        LOG.debug(this.fileDescriptorPassingDisabledReason);
    }

    this.socketFactory = NetUtils.getDefaultSocketFactory(conf);

    try {
        hostName = getHostName(conf);
        LOG.info("Configured hostname is {}", hostName);
        
        // 启动 DataNode
        startDataNode(dataDirs, resources);
    } catch (IOException ie) {
        shutdown();
        throw ie;
    }
    final int dncCacheMaxSize =
        conf.getInt(DFS_DATANODE_NETWORK_COUNTS_CACHE_MAX_SIZE_KEY,
                    DFS_DATANODE_NETWORK_COUNTS_CACHE_MAX_SIZE_DEFAULT) ;
    // network error计数
    datanodeNetworkCounts =
        CacheBuilder.newBuilder()
        .maximumSize(dncCacheMaxSize)
        .build(new CacheLoader<String, Map<String, Long>>() {
            @Override
            public Map<String, Long> load(String key) throws Exception {
                final Map<String, Long> ret = new HashMap<String, Long>();
                ret.put("networkErrors", 0L);
                return ret;
            }
        });

    // oob：out of band，意为额外的传送信息，用来向其它正在读写此DataNode数据的
    // client表明它将要马上做restart操作了，然后告诉这些client它们应该在一定超时
    // 时间内等待并忽略与此重启操作引发的异常错误。这可以避免client遇到这类异常马
    // 上执行错误recovery这类cost更高的操作
    // 这里是获取每个oob对象的超时信息
    initOOBTimeout();
    this.storageLocationChecker = storageLocationChecker;
}

在构造函数中，startDataNode方法是最重要的，根据指定的配置启动一个dn：

/**
   * This method starts the data node with the specified conf.
   * 
   * If conf's CONFIG_PROPERTY_SIMULATED property is set
   * then a simulated storage based data node is created.
   * 
   * @param dataDirectories - only for a non-simulated storage data node
   * @throws IOException
   */
void startDataNode(List<StorageLocation> dataDirectories,
                   SecureResources resources
                     ) throws IOException {

    // settings global for all BPs in the Data Node
    this.secureResources = resources;
    synchronized (this) {
      this.dataDirs = dataDirectories;
    }
    // 初始化dn conf
    this.dnConf = new DNConf(this);
    // 检查是否启用了安全配置
    checkSecureConfig(dnConf, getConf(), resources);

    if (dnConf.maxLockedMemory > 0) {
      if (!NativeIO.POSIX.getCacheManipulator().verifyCanMlock()) {
        throw new RuntimeException(String.format(
            "Cannot start datanode because the configured max locked memory" +
            " size (%s) is greater than zero and native code is not available.",
            DFS_DATANODE_MAX_LOCKED_MEMORY_KEY));
      }
      if (Path.WINDOWS) {
        NativeIO.Windows.extendWorkingSetSize(dnConf.maxLockedMemory);
      } else {
        long ulimit = NativeIO.POSIX.getCacheManipulator().getMemlockLimit();
        if (dnConf.maxLockedMemory > ulimit) {
          throw new RuntimeException(String.format(
            "Cannot start datanode because the configured max locked memory" +
            " size (%s) of %d bytes is more than the datanode's available" +
            " RLIMIT_MEMLOCK ulimit of %d bytes.",
            DFS_DATANODE_MAX_LOCKED_MEMORY_KEY,
            dnConf.maxLockedMemory,
            ulimit));
        }
      }
    }
    LOG.info("Starting DataNode with maxLockedMemory = {}",
        dnConf.maxLockedMemory);

    int volFailuresTolerated = dnConf.getVolFailuresTolerated();
    int volsConfigured = dnConf.getVolsConfigured();
    if (volFailuresTolerated < MAX_VOLUME_FAILURE_TOLERATED_LIMIT
        || volFailuresTolerated >= volsConfigured) {
      throw new HadoopIllegalArgumentException("Invalid value configured for "
          + "dfs.datanode.failed.volumes.tolerated - " + volFailuresTolerated
          + ". Value configured is either less than -1 or >= "
          + "to the number of configured volumes (" + volsConfigured + ").");
    }

    // 构建存储组件
    storage = new DataStorage();
    
    // global DN settings
    registerMXBean();
    // 初始化 DataXceiverServer
    // 存储块大小的估计值以检查磁盘分区是否有足够的空间。 较新的客户端将预期的
    // 块大小传递给 DataNode。 对于较旧的客户端，只需使用服务器端默认块大小。
    initDataXceiver();
    // 启动httpServer：DatanodeHttpServer
    startInfoServer();
    // 启动监控
    pauseMonitor = new JvmPauseMonitor();
    pauseMonitor.init(getConf());
    pauseMonitor.start();
  
    // BlockPoolTokenSecretManager is required to create ipc server.
    // 管理每个bp的密钥
    this.blockPoolTokenSecretManager = new BlockPoolTokenSecretManager();

    // Login is done by now. Set the DN user name.
    dnUserName = UserGroupInformation.getCurrentUser().getUserName();
    LOG.info("dnUserName = {}", dnUserName);
    LOG.info("supergroup = {}", supergroup);
    // 初始化client与dn的ipc server
    initIpcServer();

    metrics = DataNodeMetrics.create(getConf(), getDisplayName());
    peerMetrics = dnConf.peerStatsEnabled ?
        DataNodePeerMetrics.create(getDisplayName(), getConf()) : null;
    metrics.getJvmMetrics().setPauseMonitor(pauseMonitor);

    // 处理ErasureCode命令
    ecWorker = new ErasureCodingWorker(getConf(), this);
    // block恢复Worker，用来处理block的恢复工作
    blockRecoveryWorker = new BlockRecoveryWorker(this);

    // BlockPool的管理着
    blockPoolManager = new BlockPoolManager(this);
    /**
     * 刷新nn：这名字取的很扯，里面的操作好几个，远不止刷新nn
     *    1. 对于每个新的名称服务，确定它是对现有 NS 的一组 NN 的更新，还是全新的名称服务
     *    2. 我们目前拥有但不再存在的任何名称服务都需要删除
     *    3. 开启新的ns
     *    上面三步是在同步代码块中
     *    4. 删除过时的ns，不在同步代码块，可能触发删除等操作，耗时多
     *    5. 更新ns
     */
    blockPoolManager.refreshNamenodes(getConf());

    // Create the ReadaheadPool from the DataNode context so we can
    // exit without having to explicitly shutdown its thread pool.
    readaheadPool = ReadaheadPool.getInstance();
    saslClient = new SaslDataTransferClient(dnConf.getConf(),
        dnConf.saslPropsResolver, dnConf.trustedChannelResolver);
    saslServer = new SaslDataTransferServer(dnConf, blockPoolTokenSecretManager);
    startMetricsLogger();

    if (dnConf.diskStatsEnabled) {
      diskMetrics = new DataNodeDiskMetrics(this,
          dnConf.outliersReportIntervalMs);
    }
  }

最后回到createDataNode中进行

dn.runDatanodeDaemon();

runDatanodeDaemon()方法启动了blockPoolManager管理的所有线程，启动了DataXceiverServer线程，最后启动了DataNode的IPC Server

/** Start a single datanode daemon and wait for it to finish.
 *  If this thread is specifically interrupted, it will stop waiting.
 */
public void runDatanodeDaemon() throws IOException {
    blockPoolManager.startAll();

    // start dataXceiveServer
    dataXceiverServer.start();
    if (localDataXceiverServer != null) {
        localDataXceiverServer.start();
    }
    ipcServer.setTracer(tracer);
    ipcServer.start();
    startPlugins(getConf());
}