Ozone Upgrade模型框架分析

前言


最近Ozone社区刚刚发布了Ozone 1.1版本,这也是Ozone发布GA版本以来的第二个版本release了。当越来越多Ozone版本release后,这里就会有个版本升级的问题。可能有同学会好奇:目前Ozone支持版本Upgrade功能吗?据社区目前的进展,这个功能第一阶段实现已经基本完成,预计会在Ozone 1.2版本中和大家见面。在Ozone第一阶段Upgrade功能的实现里,一个基本的Upgrade框架模型基本成型,不过目前是non-rolling upgrade的升级,是需要集群downtime升级的。本文就简单和大家聊聊Ozone Upgrade的一个整体实现思路,比之前笔者写过的Ozone Upgrade文章会更加详细一些。

Ozone Upgrade相关要素关系


首先我们要清楚涉及到一次完整的Ozone Upgrade升级,这里面会牵扯到哪些要素。

如果论Upgrade的状态来说,这里会有Upgrade不同阶段的不同状态,比如before Upgrade, finalize upgrade和finalized阶段。

说服务的话,毫无疑问OM, SCM和Datanode服务当然是首当其冲的服务,这是第一层我们直接能够联想到的东西。

那么再往下深入一层呢,还有什么?这里我们要提到的东西有版本,version的概念。

然后在一个version里,还会有2个与其紧密相关的要素:1)每个version对应的可用的feature。2)每个version对应的layout,layout可理解为数据的一种分布方式。

然后我们再来看feature这个概念,这里就不得不提feature compatible的问题了。新老feature的compatible问题绝对也是Upgrade过程需要特别关注的。

综合上面提及到的种种要素,我们大体可以得出下面的一个Ozone Upgrade关系要素图(下图中还未提及到的概念在下文中会再进行具体阐述)。
在这里插入图片描述

Ozone Upgrade的执行器模型


分析完Ozone Uggrade相关要素后,我们再具体了解Ozone Upgrade的整个执行过程。

一个复杂集群的升级并不是一个简单的Upgrade命令的事情,往往里面还会涉及到Upgrade前后需要依赖执行的各种操作。Ozone在这边对整个Upgrade的动作进行了如下步骤的拆分执行,

...
  public void execute(T component, BasicUpgradeFinalizer finalizer)
      throws IOException {
    try {
      finalizer.emitStartingMsg();
      finalizer.getVersionManager()
          .setUpgradeState(FINALIZATION_IN_PROGRESS);
      // 升级前需要完成的步骤
      finalizer.preFinalizeUpgrade(component);
      // 升级时需要做的事情
      finalizer.finalizeUpgrade(component);
      // 升级后需要做的操作
      finalizer.postFinalizeUpgrade(component);

      finalizer.emitFinishedMsg();
    } catch (Exception e) {
      LOG.warn("Upgrade Finalization failed with following Exception. ", e);
      if (finalizer.getVersionManager().needsFinalization()) {
        finalizer.getVersionManager()
            .setUpgradeState(FINALIZATION_REQUIRED);
        throw (e);
      }
    } finally {
      finalizer.markFinalizationDone();
    }
  }
}
...

上面BasicUpgradeFinalizer扮演的是一个Upgrade执行器的角,每个component有其对应的component。

这里以SCM服务的UpgradeFinalizer为例,它需要执行如下的preFinalizeUpgrade操作,来进行Pipeline的关闭,这样就不会允许写操作的执行。

/**
 * UpgradeFinalizer for the Storage Container Manager service.
 */
public class SCMUpgradeFinalizer extends
    BasicUpgradeFinalizer<StorageContainerManager, HDDSLayoutVersionManager> {

  public SCMUpgradeFinalizer(HDDSLayoutVersionManager versionManager) {
    super(versionManager);
  }

  // This should be called in the context of a separate finalize upgrade thread.
  // This function can block indefinitely till the conditions are met to safely
  // finalize Upgrade.
  @Override
  public void preFinalizeUpgrade(StorageContainerManager scm)
      throws IOException {
    /*
     * Before we can call finalize the feature, we need to make sure that
     * all existing pipelines are closed and pipeline Manger would freeze
     * all new pipeline creation.
     */
    String msg = "  Existing pipelines and containers will be closed " +
        "during Upgrade.";
    msg += "\n  New pipelines creation will remain frozen until Upgrade " +
        "is finalized.";

    PipelineManager pipelineManager = scm.getPipelineManager();

    // Pipeline creation will remain frozen until postFinalizeUpgrade()
    pipelineManager.freezePipelineCreation();

    waitForAllPipelinesToDestroy(pipelineManager);


    // We can not yet move all the existing data nodes to HEALTHY-READONLY
    // state since the next heartbeat will move them back to HEALTHY state.
    // This has to wait till postFinalizeUpgrade, when SCM MLV version is
    // already upgraded as part of finalize processing.
    // While in this state, it should be safe to do finalize processing for
    // all new features. This will also update ondisk mlv version. Any
    // disrupting upgrade can add a hook here to make sure that SCM is in a
    // consistent state while finalizing the upgrade.

    logAndEmit(msg);
  }
  ,,,

然后升级完成后,执行后续的postFinalizeUpgrade操作来重新激活Pipeline。

  public void postFinalizeUpgrade(StorageContainerManager scm)
      throws IOException {


    // Don 't wait for next heartbeat from datanodes in order to move them to
    // Healthy - Readonly state. Force them to Healthy ReadOnly state so that
    // we can resume pipeline creation right away.
    scm.getScmNodeManager().forceNodesToHealthyReadOnly();

    PipelineManager pipelineManager = scm.getPipelineManager();

    pipelineManager.resumePipelineCreation();

    // Wait for at least one pipeline to be created before finishing
    // finalization, so clients can write.
    boolean hasPipeline = false;
    while (!hasPipeline) {
      int pipelineCount = pipelineManager.getPipelines(
          HddsProtos.ReplicationType.RATIS, HddsProtos.ReplicationFactor.THREE,
          Pipeline.PipelineState.OPEN).size();

      hasPipeline = (pipelineCount >= 1);
      if (!hasPipeline) {
        LOG.info("Waiting for at least one pipeline after SCM finalization.");
        try {
          Thread.sleep(5000);
        } catch (InterruptedException e) {
          // Try again on next loop iteration.
        }
      } else {
        LOG.info("Pipeline found after SCM finalization");
      }
    }
    emitFinishedMsg();
  }

然后我们再回过头来看finalize操作的真正执行操作,finalizeUpgrade方法。这个方法里面实质做的操作即feature操作的finalize的执行。

  protected void finalizeUpgrade(Supplier<Storage> storageSuppplier)
      throws UpgradeException {
    // 获取到那些还没有finalize的feature(升级带来的新feature),
    for (Object obj : versionManager.unfinalizedFeatures()) {
      LayoutFeature lf = (LayoutFeature) obj;
      Storage layoutStorage = storageSuppplier.get();
      // 获取新feature里的在filize阶段需要执行的action操作
      Optional<? extends UpgradeAction> action = lf.action(ON_FINALIZE);
      // 执行上面的action操作
      finalizeFeature(lf, layoutStorage, action);
      updateLayoutVersionInVersionFile(lf, layoutStorage);
      // finalzie此feature,此操作意为此feature已经升级成功
      versionManager.finalized(lf);
    }
    versionManager.completeFinalization();
  }

从上面代码可以看到,Ozone的feature里面是包含了不同阶段需要执行的action,比如下面这个类:

/**
 * List of OM Layout features / versions.
 */
public enum OMLayoutFeature implements LayoutFeature {
  //  //
  INITIAL_VERSION(0, "Initial Layout Version");


  ///  /
  //    Example OM Layout Feature with Actions
  //      CREATE_EC(1, "",
  //          new ImmutablePair<>(ON_FINALIZE, new OnFinalizeECAction()),
  //          new ImmutablePair<>(FIRST_RUN_ON_UPGRADE,
  //          new OnFirstUpgradeStartECAction());
  //
  //  //

  private int layoutVersion;
  private String description;
  private EnumMap<UpgradeActionType, OmUpgradeAction> actions =
      new EnumMap<>(UpgradeActionType.class);
...
  /**
   * upgrade action执行的阶段
   */
  enum UpgradeActionType {

    // Run every time an un-finalized component is started up.
    VALIDATE_IN_PREFINALIZE,

    // Run exactly once when an upgraded cluster is detected with this new
    // layout version.
    // NOTE 1 : This will not be run in a NEW cluster!
    // NOTE 2 : This needs to be a backward compatible action until a DOWNGRADE
    // hook is provided!
    // NOTE 3 : These actions are not submitted through RATIS (TODO)
    ON_FIRST_UPGRADE_START,

    // Run exactly once during finalization of layout feature.
    ON_FINALIZE
  }
...

我们看到,在UpgradeActionType执行阶段的定义里还有升级前的初始启动校验操作和初次action操作。这些action可以很好的作为新feature对已有老feature的兼容行为操作。

最后我们再来看这里的UpgradeAction的定义,UpgradeAction表示的意思是一个特定feature在特定Upgrade阶段的action操作行为。比如下面这个action操作为SCM HA功能在升级前需要执行的检测操作。

@UpgradeActionHdds(feature = SCM_HA, component = SCM,
    type = VALIDATE_IN_PREFINALIZE)
public class ScmHAUnfinalizedStateValidationAction
    implements HDDSUpgradeAction<StorageContainerManager> {

  @Override
  public void execute(StorageContainerManager scm) throws Exception {
    boolean isHAEnabled =
        scm.getConfiguration().getBoolean(ScmConfigKeys.OZONE_SCM_HA_ENABLE_KEY,
        ScmConfigKeys.OZONE_SCM_HA_ENABLE_DEFAULT);

    if (isHAEnabled) {
      throw new UpgradeException(String.format("Configuration %s cannot be " +
          "used until SCM upgrade has been finalized",
          ScmConfigKeys.OZONE_SCM_HA_ENABLE_KEY),
          UpgradeException.ResultCodes.PREFINALIZE_ACTION_VALIDATION_FAILED);
    }
  }
}

这里用了annotation的方式进行了UpgradeAction操作信息的标注。

综上所述,Ozone的这套UpgradeAction,Finalizer的设计实现使得整个Upgrade执行变得更加的独立和灵活,对原有代码逻辑的侵入性也不至于过强。

以上就是本文要阐述的关于Ozone Upgrade的相关的内容了,对Ozone Upgrade感兴趣的同学还可以阅读笔者之前写过的另外一篇博文Ozone OM Upgrade期间请求一致性处理的保证。另外本文提及的代码可参考文末链接处。

相关链接


[1].https://github.com/apache/ozone/blob/HDDS-3698-nonrolling-upgrade/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/upgrade/BasicUpgradeFinalizer.java
[2].https://github.com/apache/ozone/blob/HDDS-3698-nonrolling-upgrade/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/upgrade/ScmHAUnfinalizedStateValidationAction.java

相关推荐
©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客 返回首页