dremio 的 reflection 特性

最新推荐文章于 2024-05-14 09:49:47 发布

自强不息00001

最新推荐文章于 2024-05-14 09:49:47 发布

阅读量291

点赞数 1

分类专栏：数据库文章标签： java 数据库

本文链接：https://blog.csdn.net/ren913458653/article/details/126847750

版权

数据库专栏收录该内容

2 篇文章 0 订阅

订阅专栏

最近在调研dremio数据湖的一个reflection（反射）的特性，通过找资料和代码研读，有了一些发现，简单来说reflection（反射）就是根据之前定义好的一个接口收集数据，在真正使用时直接返回之前收集的结果的一个回调。

设置的反射是针对数据的，定义好反射后，内部会将数据从其他端拉到本地保存起来，可以按照设定好的order by partition by等分好，在系统调用时，进行反射匹配（逻辑计划的匹配），匹配上就可以从locacl 的cache中提取数据返回给客户端，通过反射可以节省直接去拉数据计算的时间，极大的提高了query的速度。

说道逻辑计划的匹配，从代码的/sabot/kernel/src/main/java/com/dremio/exec/planner/acceleration/PlanHasher.java中可以看出，先将query的表转换成scan计划，然后根据scan计划 to_string计算出hashcode进行匹配。

public DremioMaterialization expand(MaterializationDescriptor descriptor) {

RelNode queryRel = deserializePlan(descriptor.getPlan(), parent, catalogService);

// used for old reflections where we stripped before plan persistence.
final boolean preStripped = descriptor.getStrippedPlanHash() == null;
final StrippingFactory factory = new StrippingFactory(parent.getSettings().getOptions(), parent.getConfig());

StripResult stripResult = preStripped ? StrippingFactory.noStrip(queryRel) : factory.strip(queryRel, descriptor.getReflectionType(), descriptor.getIncrementalUpdateSettings().isIncremental(), descriptor.getStripVersion());

// we need to make sure that the persisted version of the plan after applying the stripping is
// consistent with what we got when materializing. We'll do this again during substitution in
// various forms and are doing it here for checking the validity of the expansion.
if(!preStripped) {
long strippedHash = PlanHasher.hash(stripResult.getNormalized());
if(strippedHash != descriptor.getStrippedPlanHash()) {
throw new ExpansionException(String.format("Stripped hash doesn't match expect stripped hash. Stripped logic likely changed. Non-matching plan: %s.", RelOptUtil.toString(stripResult.getNormalized())));
}
}

// if this is an incremental update, we need to do some changes to support the incremental. These need to be applied after incremental update completes.
final RelTransformer postStripNormalizer = getPostStripNormalizer(descriptor);
stripResult = stripResult.transformNormalized(postStripNormalizer);

logger.debug("Query rel:{}", RelOptUtil.toString(queryRel));

RelNode tableRel = expandSchemaPath(descriptor.getPath());

if (tableRel == null) {
throw new ExpansionException("Unable to find read metadata for materialization.");
}

BatchSchema schema = ((ScanCrel) tableRel).getBatchSchema();
final RelDataType strippedQueryRowType = stripResult.getNormalized().getRowType();
tableRel = tableRel.accept(new IncrementalUpdateUtils.RemoveDirColumn(strippedQueryRowType));

// Namespace table removes UPDATE_COLUMN from scans, but for incremental materializations, we need to add it back
// to the table scan
if (descriptor.getIncrementalUpdateSettings().isIncremental()) {
tableRel = tableRel.accept(IncrementalUpdateUtils.ADD_MOD_TIME_SHUTTLE);
}

// if the row types don't match, ignoring the nullability, fail immediately
if (!areRowTypesEqual(tableRel.getRowType(), strippedQueryRowType, true)) {
throw new ExpansionException(String.format("Materialization %s have different row types for its table and query rels.%n" +
"table row type %s%nquery row type %s", descriptor.getMaterializationId(), tableRel.getRowType(), strippedQueryRowType));
}

try {
// Check that the table rel row type matches that of the query rel,
// if so, cast the table rel row types to the query rel row types.
tableRel = MoreRelOptUtil.createCastRel(tableRel, strippedQueryRowType);
} catch (Exception | AssertionError e) {
throw UserException.planError(e)
.message("Failed to cast table rel row types to the query rel row types for materialization %s.%n" +
"table schema %s%nquery schema %s", descriptor.getMaterializationId(),
CalciteArrowHelper.fromCalciteRowType(tableRel.getRowType()),
CalciteArrowHelper.fromCalciteRowType(strippedQueryRowType))
.build(logger);
}

return new DremioMaterialization(
tableRel,
queryRel,
descriptor.getIncrementalUpdateSettings(),
descriptor.getJoinDependencyProperties(),
descriptor.getLayoutInfo(),
descriptor.getMaterializationId(),
schema,
descriptor.getExpirationTimestamp(),
preStripped,
descriptor.getStripVersion(), // Should use the strip version of the materialization we are expanding
postStripNormalizer
);
}

public class PlanHasher {
private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(PlanHasher.class);

public static long hash(RelNode node) {
RelNode cleansed = node.accept(new RoutingShuttle() {

@Override
public RelNode visit(RelNode other) {
if( !(other instanceof ScanCrel) ) {
return super.visit(other);
}

ScanCrel sc = (ScanCrel) other;
return new GenericScan(sc.getTableMetadata().getName(), sc.getRowType(), sc.getCluster(), sc.getTraitSet());
}});
long hash = Hashing.murmur3_128().hashBytes(RelOptUtil.toString(cleansed).getBytes(StandardCharsets.UTF_8)).asLong();

if(logger.isDebugEnabled()) {
logger.debug("Hashed Plan {} to value {}", RelOptUtil.toString(cleansed), hash);
}

return hash;
}
}

这种匹配的机制需要完全匹配的是表查询计划，其中还需要匹配反射的条件。

调研的还不是太深，如果哪个地方有问题大家可以留言，验证后及时更正。