Hadoop3.2.1 【 HDFS 】源码分析 : Secondary Namenode解析

 

一. 前言

 Secondary NameNode 只有一个, 他的的作用是辅助NameNode进行原数的checkpoint操作, 即合并fsimage文件.

Secondary NameNode是一个守护进程,定时触发checkpoint操作操作, 使用NamenodeProtocol 与NameNode进行通讯.

 

参数:

序号参数默认值描述
1dfs.namenode.checkpoint.check.period60sSecondaryNameNode和CheckpointNode将每隔'dfs.namenode.checkpoint.period'秒以查询未选中的事务数。
2dfs.namenode.checkpoint.period3600s [1小时]两个连续checkpoint的最大延时
3dfs.namenode.checkpoint.txns100万checkpoint最大事务数
4dfs.namenode.checkpoint.max-retries3次重试次数
    

 

 

二.checkpoints流程说明

在非HA部署环境下, 合并FSImage操作是由Secondary Namenode来执行的。

Namenode会触发一次合并FSImage操作:

①超过了配置的检查点操作时长(dfs.namenode.checkpoint.period配置项配置,默认值: 1小时) ;

②从上一次检查点操作后, 发生的事务(transaction) 数超过了配置(dfs.namenode.checkpoint.txns配置项配置,默认值:100万) 。

 

流程示意图: 
 

 

■ Secondary Namenode检查两个触发CheckPoint流程的条件是否满足.由于在非HA状态下, Secondary Namenode和Namenode之间并没有共享的editlog文件目录, 所以最新的事务id(transactionId)是Secondary Namenode通过调用RPC方法
NamenodeProtocol.getTransactionId()获取的。
■ Secondary Namenode调用RPC方法NamenodeProtocol.rollEditLog()触发editlog重置操作, 将当前正在写的editlog段落结束, 并创建新的edit.new文件, 这个操作还会返回当前fsimage以及刚刚重置的editlog的事务id (seen_id) 。 这样当Secondary Namenode从Namenode读取editlog文件时, 新的操作就可以写入edit.new文件中, 不影响editlog记录功能。 在HA模式下, 并不需要显式地触发editlog的重置操作, 因为Standby Namenode会定期重置editlog。
■ 有了最新的txid以及seen_id, Secondary Namenode就会发起HTTP GET请求到Namenode的GetImageServlet以获取新的fsimage和editlog文件。 需要注意,Secondary Namenode在进行上一次的CheckPoint操作时, 可能已经获取了部分fsimage和edits文件。

■ Secondary Namenode会加载新下载的fsimage文件以重建Secondary Namenode的命名空间。
■ Secondary Namenode读取edits中的记录, 并与当前的命名空间合并, 这样Secondary Namenode的命名空间和Namenode的命名空间就同步了。
■ Secondary Namenode将最新的同步的命名空间写入新的fsimage文件中。
■ Secondary Namenode向Namenode的ImageServlet发送HTTP GET请求/getimage?putimage=1。 这个请求的URL中还包含了新的fsimage文件的事务ID,以及Secondary Namenode用于下载的端口和IP地址。
■ Namenode会根据Secondary Namenode提供的信息向Secondary Namenode的GetImageServlet发起HTTP GET请求下载fsimage文件。 Namenode首先将下载文件命名为fsimage.ckpt_, 然后创建MD5校验和, 最后将fsimage.ckpt_重命名为fsimage_xxxxx。

 

 

三. 启动

直接看main函数, 有两种启动模式,

第一种: 执行一个命令,然后终止.

CHECKPOINT :手动执行checkpoint,但是如果没有达到触发条件,依旧不会执行checkpoint.
GETEDITSIZE: 获取未执行checkpoint的事务数量

第二种, 作为一个守护进程进行启动[ 开启InfoServer 和 CheckpointThread : 定期执行checkpoint ]

 /**
   *
   * main() has some simple utility methods.
   * @param argv Command line parameters.
   * @exception Exception if the filesystem does not exist.
   */
  public static void main(String[] argv) throws Exception {
    CommandLineOpts opts = SecondaryNameNode.parseArgs(argv);
    if (opts == null) {
      LOG.error("Failed to parse options");
      terminate(1);
    } else if (opts.shouldPrintHelp()) {
      opts.usage();
      System.exit(0);
    }

    try {
      StringUtils.startupShutdownMessage(SecondaryNameNode.class, argv, LOG);
      Configuration tconf = new HdfsConfiguration();
      SecondaryNameNode secondary = null;
      secondary = new SecondaryNameNode(tconf, opts);

      // SecondaryNameNode can be started in 2 modes:
      // 1. run a command (i.e. checkpoint or geteditsize) then terminate
      // 2. run as a daemon when {@link #parseArgs} yields no commands
      if (opts != null && opts.getCommand() != null) {
        // mode 1
        int ret = secondary.processStartupCommand(opts);
        terminate(ret);
      } else {
        // mode 2
        secondary.startInfoServer();
        secondary.startCheckpointThread();
        secondary.join();
      }
    } catch (Throwable e) {
      LOG.error("Failed to start secondary namenode", e);
      terminate(1);
    }
  }

我们直接看第二种,

 

四.startInfoServer

首先要启动一个http server [ 默认: dfs.namenode.secondary.http-address : 0.0.0.0:9869 ] 与namenode进行通讯.

 

/**
   * Start the web server.
   */
  @VisibleForTesting
  public void startInfoServer() throws IOException {
    final InetSocketAddress httpAddr = getHttpAddress(conf);

    // 默认: dfs.namenode.secondary.http-address : 0.0.0.0:9869
    final String httpsAddrString = conf.getTrimmed(
        DFSConfigKeys.DFS_NAMENODE_SECONDARY_HTTPS_ADDRESS_KEY,
        DFSConfigKeys.DFS_NAMENODE_SECONDARY_HTTPS_ADDRESS_DEFAULT);
    InetSocketAddress httpsAddr = NetUtils.createSocketAddr(httpsAddrString);


    // 构架http服务
    HttpServer2.Builder builder = DFSUtil.httpServerTemplateForNNAndJN(conf,
        httpAddr, httpsAddr, "secondary", DFSConfigKeys.
            DFS_SECONDARY_NAMENODE_KERBEROS_INTERNAL_SPNEGO_PRINCIPAL_KEY,
        DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY);

    // dfs.xframe.enabled : 默认 true
    // 如果为true,则通过返回设置为SAMEORIGIN的X_FRAME_OPTIONS标题值来启用防止单击劫持的保护。
    // Clickjacking保护可防止攻击者使用透明或不透明层诱骗用户单击另一页上的按钮或链接。
    final boolean xFrameEnabled = conf.getBoolean(
        DFSConfigKeys.DFS_XFRAME_OPTION_ENABLED,
        DFSConfigKeys.DFS_XFRAME_OPTION_ENABLED_DEFAULT);

    // dfs.xframe.value : SAMEORIGIN   可选:  DENY  SAMEORIGIN    ALLOW-FROM
    final String xFrameOptionValue = conf.getTrimmed(
        DFSConfigKeys.DFS_XFRAME_OPTION_VALUE,
        DFSConfigKeys.DFS_XFRAME_OPTION_VALUE_DEFAULT);

    builder.configureXFrame(xFrameEnabled).setXFrameOption(xFrameOptionValue);

    infoServer = builder.build();
    infoServer.setAttribute("secondary.name.node", this);
    infoServer.setAttribute("name.system.image", checkpointImage);
    infoServer.setAttribute(JspHelper.CURRENT_CONF, conf);
    infoServer.addInternalServlet("imagetransfer", ImageServlet.PATH_SPEC,
        ImageServlet.class, true);
    infoServer.start();

    LOG.info("Web server init done");

    HttpConfig.Policy policy = DFSUtil.getHttpPolicy(conf);
    int connIdx = 0;
    if (policy.isHttpEnabled()) {


      InetSocketAddress httpAddress =
          infoServer.getConnectorAddress(connIdx++);

      // dfs.namenode.secondary.http-address
      conf.set(DFSConfigKeys.DFS_NAMENODE_SECONDARY_HTTP_ADDRESS_KEY,
          NetUtils.getHostPortString(httpAddress));
    }

    if (policy.isHttpsEnabled()) {
      InetSocketAddress httpsAddress =
          infoServer.getConnectorAddress(connIdx);
      conf.set(DFSConfigKeys.DFS_NAMENODE_SECONDARY_HTTPS_ADDRESS_KEY,
          NetUtils.getHostPortString(httpsAddress));
    }
  }

 

五.startCheckpointThread

启动checkpoint 线程. 这个没啥说的,就是启动了一个守护进程而已... 

SecondaryNameNode实现了Runnable接口,所以会直接调度用run() 方法

  public void startCheckpointThread() {
    Preconditions.checkState(checkpointThread == null,
        "Should not already have a thread");
    Preconditions.checkState(shouldRun, "shouldRun should be true");
    
    checkpointThread = new Daemon(this);
    checkpointThread.start();
  }

 

六. doWork()

 //
  // The main work loop
  //
  public void doWork() {
    //
    // Poll the Namenode (once every checkpointCheckPeriod seconds) to find the
    // number of transactions in the edit log that haven't yet been checkpointed.
    //
    long period = checkpointConf.getCheckPeriod();
    int maxRetries = checkpointConf.getMaxRetriesOnMergeError();

    while (shouldRun) {
      try {
        Thread.sleep(1000 * period);
      } catch (InterruptedException ie) {
        // do nothing
      }
      if (!shouldRun) {
        break;
      }
      try {
        // We may have lost our ticket since last checkpoint, log in again, just in case
        if(UserGroupInformation.isSecurityEnabled())
          UserGroupInformation.getCurrentUser().checkTGTAndReloginFromKeytab();
        
        final long monotonicNow = Time.monotonicNow();
        final long now = Time.now();

        //  是否超过最大事务数限制[默认100万]
        //  或者两次checkpoint超过1小时
        if (shouldCheckpointBasedOnCount() ||
            monotonicNow >= lastCheckpointTime + 1000 * checkpointConf.getPeriod()) {

          // 执行 checkpoint 操作
          doCheckpoint();
          
          lastCheckpointTime = monotonicNow;
          lastCheckpointWallclockTime = now;
        }
      } catch (IOException e) {
        LOG.error("Exception in doCheckpoint", e);
        e.printStackTrace();
        // Prevent a huge number of edits from being created due to
        // unrecoverable conditions and endless retries.
        if (checkpointImage.getMergeErrorCount() > maxRetries) {
          LOG.error("Merging failed " +
              checkpointImage.getMergeErrorCount() + " times.");
          terminate(1);
        }
      } catch (Throwable e) {
        LOG.error("Throwable Exception in doCheckpoint", e);
        e.printStackTrace();
        terminate(1, e);
      }
    }
  }

七. doCheckpoint [ 执行 checkpoint 核心操作 ]

 


  /**
   * Create a new checkpoint
   * @return if the image is fetched from primary or not
   */
  @VisibleForTesting
  @SuppressWarnings("deprecated")
  public boolean doCheckpoint() throws IOException {
    checkpointImage.ensureCurrentDirExists();
    NNStorage dstStorage = checkpointImage.getStorage();
    
    // Tell the namenode to start logging transactions in a new edit file
    // Returns a token that would be used to upload the merged image.

    // 告诉namenode在新的edits文件中开始记录事务 , 如果处于安全模式则失败.
    // 返回一个token用于merge image
    CheckpointSignature sig = namenode.rollEditLog();
    
    boolean loadImage = false;
    boolean isFreshCheckpointer = (checkpointImage.getNamespaceID() == 0);

    boolean isSameCluster =
        (dstStorage.versionSupportsFederation(NameNodeLayoutVersion.FEATURES)
            && sig.isSameCluster(checkpointImage)) ||
        (!dstStorage.versionSupportsFederation(NameNodeLayoutVersion.FEATURES)
            && sig.namespaceIdMatches(checkpointImage));


    if (isFreshCheckpointer ||
        (isSameCluster &&
         !sig.storageVersionMatches(checkpointImage.getStorage()))) {
      // if we're a fresh 2NN, or if we're on the same cluster and our storage
      // needs an upgrade, just take the storage info from the server.
      dstStorage.setStorageInfo(sig);
      dstStorage.setClusterID(sig.getClusterID());
      dstStorage.setBlockPoolID(sig.getBlockpoolID());
      loadImage = true;
    }
    sig.validateStorageInfo(checkpointImage);

    // error simulation code for junit test
    CheckpointFaultInjector.getInstance().afterSecondaryCallsRollEditLog();

    RemoteEditLogManifest manifest =
      namenode.getEditLogManifest(sig.mostRecentCheckpointTxId + 1);

    // Fetch fsimage and edits. Reload the image if previous merge failed.
    // 拉取fsimage和edits, 如果merge失败则重新加载image
    loadImage |= downloadCheckpointFiles(
        fsName, checkpointImage, sig, manifest) |
        checkpointImage.hasMergeError();
    try {
      //执行merge操作
      doMerge(sig, manifest, loadImage, checkpointImage, namesystem);
    } catch (IOException ioe) {
      // A merge error occurred. The in-memory file system state may be
      // inconsistent, so the image and edits need to be reloaded.
      checkpointImage.setMergeError();
      throw ioe;
    }
    // Clear any error since merge was successful.
    checkpointImage.clearMergeError();

    
    //
    // Upload the new image into the NameNode. Then tell the Namenode
    // to make this new uploaded image as the most current image.
    //  上传新的image 到NameNode
    //  告诉Namenode将上传的image作为最新的image
    long txid = checkpointImage.getLastAppliedTxId();
    
    //上传凑在哦.
    TransferFsImage.uploadImageFromStorage(fsName, conf, dstStorage,
        NameNodeFile.IMAGE, txid);

    // error simulation code for junit test
    CheckpointFaultInjector.getInstance().afterSecondaryUploadsNewImage();

    LOG.warn("Checkpoint done. New Image Size: " 
             + dstStorage.getFsImageName(txid).length());

    if (legacyOivImageDir != null && !legacyOivImageDir.isEmpty()) {
      try {
        checkpointImage.saveLegacyOIVImage(namesystem, legacyOivImageDir,
            new Canceler());
      } catch (IOException e) {
        LOG.warn("Failed to write legacy OIV image: ", e);
      }
    }
    return loadImage;
  }

 


 

 

 

 

 

 

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值