dremio 的 reflection 特性

    最近在调研dremio数据湖的一个reflection(反射)的特性,通过找资料和代码研读,有了一些发现,简单来说reflection(反射)就是根据之前定义好的一个接口收集数据,在真正使用时直接返回之前收集的结果的一个回调。

 

    设置的反射是针对数据的,定义好反射后,内部会将数据从其他端拉到本地保存起来,可以按照设定好的order by   partition by等分好,在系统调用时,进行反射匹配(逻辑计划的匹配),匹配上就可以从locacl 的cache中提取数据返回给客户端,通过反射可以节省直接去拉数据计算的时间,极大的提高了query的速度。

 

    说道逻辑计划的匹配,从代码的/sabot/kernel/src/main/java/com/dremio/exec/planner/acceleration/PlanHasher.java中可以看出,先将query的表转换成scan计划,然后根据scan计划 to_string计算出hashcode进行匹配。

public DremioMaterialization expand(MaterializationDescriptor descriptor) {

    RelNode queryRel = deserializePlan(descriptor.getPlan(), parent, catalogService);

    // used for old reflections where we stripped before plan persistence.
    final boolean preStripped = descriptor.getStrippedPlanHash() == null;
    final StrippingFactory factory = new StrippingFactory(parent.getSettings().getOptions(), parent.getConfig());

    StripResult stripResult = preStripped ? StrippingFactory.noStrip(queryRel) : factory.strip(queryRel, descriptor.getReflectionType(), descriptor.getIncrementalUpdateSettings().isIncremental(), descriptor.getStripVersion());

    // we need to make sure that the persisted version of the plan after applying the stripping is
    // consistent with what we got when materializing. We'll do this again during substitution in
    // various forms and are doing it here for checking the validity of the expansion.
    if(!preStripped) {
      long strippedHash = PlanHasher.hash(stripResult.getNormalized());
      if(strippedHash != descriptor.getStrippedPlanHash()) {
        throw new ExpansionException(String.format("Stripped hash doesn't match expect stripped hash. Stripped logic likely changed. Non-matching plan: %s.", RelOptUtil.toString(stripResult.getNormalized())));
      }
    }

    // if this is an incremental update, we need to do some changes to support the incremental. These need to be applied after incremental update completes.
    final RelTransformer postStripNormalizer = getPostStripNormalizer(descriptor);
    stripResult = stripResult.transformNormalized(postStripNormalizer);

    logger.debug("Query rel:{}", RelOptUtil.toString(queryRel));

    RelNode tableRel = expandSchemaPath(descriptor.getPath());

    if (tableRel == null) {
      throw new ExpansionException("Unable to find read metadata for materialization.");
    }

    BatchSchema schema = ((ScanCrel) tableRel).getBatchSchema();
    final RelDataType strippedQueryRowType = stripResult.getNormalized().getRowType();
    tableRel = tableRel.accept(new IncrementalUpdateUtils.RemoveDirColumn(strippedQueryRowType));

    // Namespace table removes UPDATE_COLUMN from scans, but for incremental materializations, we need to add it back
    // to the table scan
    if (descriptor.getIncrementalUpdateSettings().isIncremental()) {
      tableRel = tableRel.accept(IncrementalUpdateUtils.ADD_MOD_TIME_SHUTTLE);
    }

    // if the row types don't match, ignoring the nullability, fail immediately
    if (!areRowTypesEqual(tableRel.getRowType(), strippedQueryRowType, true)) {
      throw new ExpansionException(String.format("Materialization %s have different row types for its table and query rels.%n" +
        "table row type %s%nquery row type %s", descriptor.getMaterializationId(), tableRel.getRowType(), strippedQueryRowType));
    }

    try {
      // Check that the table rel row type matches that of the query rel,
      // if so, cast the table rel row types to the query rel row types.
      tableRel = MoreRelOptUtil.createCastRel(tableRel, strippedQueryRowType);
    } catch (Exception | AssertionError e) {
      throw UserException.planError(e)
        .message("Failed to cast table rel row types to the query rel row types for materialization %s.%n" +
          "table schema %s%nquery schema %s", descriptor.getMaterializationId(),
          CalciteArrowHelper.fromCalciteRowType(tableRel.getRowType()),
          CalciteArrowHelper.fromCalciteRowType(strippedQueryRowType))
        .build(logger);
    }

    return new DremioMaterialization(
      tableRel,
      queryRel,
      descriptor.getIncrementalUpdateSettings(),
      descriptor.getJoinDependencyProperties(),
      descriptor.getLayoutInfo(),
      descriptor.getMaterializationId(),
      schema,
      descriptor.getExpirationTimestamp(),
      preStripped,
      descriptor.getStripVersion(), // Should use the strip version of the materialization we are expanding
      postStripNormalizer
    );
  }

public class PlanHasher {
  private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(PlanHasher.class);

  public static long hash(RelNode node) {
    RelNode cleansed = node.accept(new RoutingShuttle() {

      @Override
      public RelNode visit(RelNode other) {
        if( !(other instanceof ScanCrel) ) {
          return super.visit(other);
        }

        ScanCrel sc = (ScanCrel) other;
        return new GenericScan(sc.getTableMetadata().getName(), sc.getRowType(), sc.getCluster(), sc.getTraitSet());
      }});
    long hash = Hashing.murmur3_128().hashBytes(RelOptUtil.toString(cleansed).getBytes(StandardCharsets.UTF_8)).asLong();

    if(logger.isDebugEnabled()) {
      logger.debug("Hashed Plan {} to value {}", RelOptUtil.toString(cleansed), hash);
    }

    return hash;
  }
}

     这种匹配的机制需要完全匹配的是表查询计划,其中还需要匹配反射的条件。

     

      调研的还不是太深,如果哪个地方有问题大家可以留言,验证后及时更正。

 

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值