Flink Catalog

1.基础

  Flink提供了基础的Catalog接口,自定义Catalog实现此接口即可
  此外还有一个CatalogFactory接口,用于创建Catalog

2.CreateCatalog

  走SQL解析流程,在TableEnvironmentImpl.executeInternal()步骤,根据SQL操作类型,选择CreateCatalog分支

} else if (operation instanceof CreateCatalogOperation) {
    return createCatalog((CreateCatalogOperation) operation);

  CreateCatalogOperation包含两个属性:name和properties

public class CreateCatalogOperation implements CreateOperation {
    private final String catalogName;
    private final Map<String, String> properties;

  在createCatalog方法中,核心就是两步:加载Catalog类、注册Catalog类

Catalog catalog =
        FactoryUtil.createCatalog(
                catalogName, properties, tableConfig, userClassLoader);
catalogManager.registerCatalog(catalogName, catalog);

  加载就是通用的,从classpath查找CatalogFactory的子类,然后调用其createCatalog方法
  这边不在classpath的Catalog走的应该是NoMatchingTableFactoryException,由userClassLoader去加载

3.UseCatalog

  通用的SQL解析流程,走到executeInternal(),走UseCatalogOperation分支,USE就很简单,直接设置一下catalogManager就行

} else if (operation instanceof UseCatalogOperation) {
    UseCatalogOperation useCatalogOperation = (UseCatalogOperation) operation;
    catalogManager.setCurrentCatalog(useCatalogOperation.getCatalogName());
    return TableResultImpl.TABLE_RESULT_OK;

4.DropCatalog

  一样走到executeInternal(),走DropCatalog分支,从catalogManager的列表中删除,是Flink层面的一个逻辑删除

} else if (operation instanceof DropCatalogOperation) {
    DropCatalogOperation dropCatalogOperation = (DropCatalogOperation) operation;
    String exMsg = getDDLOpExecuteErrorMsg(dropCatalogOperation.asSummaryString());
    try {
        catalogManager.unregisterCatalog(
                dropCatalogOperation.getCatalogName(), dropCatalogOperation.isIfExists());
        return TableResultImpl.TABLE_RESULT_OK;
    } catch (CatalogException e) {
        throw new ValidationException(exMsg, e);
    }

5.在Catalog下操作

5.1.CreateDatabase

  同样的解析流程到executeInternal,走CreateDatabase分支,最后这个操作的执行就是直接调用的catalog的createDatabase接口
  目前MemoryCatalog和HiveCatalog是支持操作的,JdbcCatalog不支持
  dropDatabase走一样的流程,调用不同的接口

} else if (operation instanceof CreateDatabaseOperation) {
    CreateDatabaseOperation createDatabaseOperation = (CreateDatabaseOperation) operation;
    Catalog catalog = getCatalogOrThrowException(createDatabaseOperation.getCatalogName());
    String exMsg = getDDLOpExecuteErrorMsg(createDatabaseOperation.asSummaryString());
    try {
        catalog.createDatabase(
                createDatabaseOperation.getDatabaseName(),
                createDatabaseOperation.getCatalogDatabase(),
                createDatabaseOperation.isIgnoreIfExists());
        return TableResultImpl.TABLE_RESULT_OK;

  createDatabaseOperation中带了CatalogName,这个是在SqlToOperationConverter步骤做的

} else if (validated instanceof SqlCreateDatabase) {
    return Optional.of(converter.convertCreateDatabase((SqlCreateDatabase) validated));

  这里如果带了catalog就用,否则就是要默认的

String catalogName =
        (fullDatabaseName.length == 1)
                ? catalogManager.getCurrentCatalog()
                : fullDatabaseName[0];

5.2.CreateTable

  Table这里有一个Temporary表和普通的表,流程是一样的

} else if (operation instanceof CreateTableOperation) {
    CreateTableOperation createTableOperation = (CreateTableOperation) operation;
    if (createTableOperation.isTemporary()) {
        catalogManager.createTemporaryTable(
                createTableOperation.getCatalogTable(),
                createTableOperation.getTableIdentifier(),
                createTableOperation.isIgnoreIfExists());
    } else {
        catalogManager.createTable(
                createTableOperation.getCatalogTable(),
                createTableOperation.getTableIdentifier(),
                createTableOperation.isIgnoreIfExists());
    }
    return TableResultImpl.TABLE_RESULT_OK;

  Catalog相关的内容同样是在SqlToOperationConverter步骤做的

} else if (validated instanceof SqlCreateTable) {
    return Optional.of(
            converter.createTableConverter.convertCreateTable((SqlCreateTable) validated));

  identifier这里会带上catalog和database相关的信息

public ObjectIdentifier qualifyIdentifier(UnresolvedIdentifier identifier) {
    return ObjectIdentifier.of(
            identifier.getCatalogName().orElseGet(this::getCurrentCatalog),
            identifier.getDatabaseName().orElseGet(this::getCurrentDatabase),
            identifier.getObjectName());
}

5.3.Query

  在PlannerBase的translateToRel接口当中,所有的分支都有一个getRelBuilder的使用

case s: UnregisteredSinkModifyOperation[_] =>
  val input = getRelBuilder.queryOperation(s.getChild).build()

  这是一个方法,里面设置了Catalog和Database。这个封装的比较厉害,最后应该是基于Calcite的内容做的

/** Returns the [[FlinkRelBuilder]] of this TableEnvironment. */
private[flink] def getRelBuilder: FlinkRelBuilder = {
  val currentCatalogName = catalogManager.getCurrentCatalog
  val currentDatabase = catalogManager.getCurrentDatabase
  plannerContext.createRelBuilder(currentCatalogName, currentDatabase)
}

  getRelBuilder.queryOperation经过调用链QueryOperationConverter.visit -> FlinkRelBuilder.scan -> CatalogSourceTable.toRel -> createDynamicTableSource,这里用到了Catalog

final Optional<DynamicTableSourceFactory> factoryFromCatalog =
        schemaTable
                .getContextResolvedTable()
                .getCatalog()
                .flatMap(Catalog::getFactory)
                .map(
                        f ->
                                f instanceof DynamicTableSourceFactory
                                        ? (DynamicTableSourceFactory) f
                                        : null);

  这边Catalog里的factory就是用来创建Table的,以jdbc来说,如下

public DynamicTableSource createDynamicTableSource(Context context) {
    final FactoryUtil.TableFactoryHelper helper =
            FactoryUtil.createTableFactoryHelper(this, context);
    final ReadableConfig config = helper.getOptions();

    helper.validate();
    validateConfigOptions(config);
    validateDataTypeWithJdbcDialect(context.getPhysicalRowDataType(), config.get(URL));
    return new JdbcDynamicTableSource(
            getJdbcOptions(helper.getOptions()),
            getJdbcReadOptions(helper.getOptions()),
            getJdbcLookupOptions(helper.getOptions()),
            context.getPhysicalRowDataType());
}

  Factory目前只有jdbc和hive有,如果Catalog对应没有factory,那就会从classpath找DynamicTableSourceFactory的实现类来做

try {
    final DynamicTableSourceFactory factory =
            preferredFactory != null
                    ? preferredFactory
                    : discoverTableFactory(DynamicTableSourceFactory.class, context);
    return factory.createDynamicTableSource(context);

  这里面最关键的是Context,需要在Context里面置入能识别表的配置,就是connector选项

6.关于Catalog与Table的类型继承关系

  FactoryUtil里有connector,表是用connector识别的,Catalog是用type识别的
  在FactoryUtil识别表的地方(5.3最后代码片段),首先判断preferredFactory 存在与否,Iceberg这个值是存在的
  在CatalogSourceTable的createDynamicTableSource当中,从catalog提取了对应的类

final Optional<DynamicTableSourceFactory> factoryFromCatalog =
        schemaTable
                .getContextResolvedTable()
                .getCatalog()
                .flatMap(Catalog::getFactory)
                .map(
                        f ->
                                f instanceof DynamicTableSourceFactory
                                        ? (DynamicTableSourceFactory) f
                                        : null);

  核心就是flatMap这里的创建Factory,Iceberg的实现如下

public Optional<Factory> getFactory() {
  return Optional.of(new FlinkDynamicTableFactory(this));
}

  FlinkDynamicTableFactory是DynamicTableSinkFactory和DynamicTableSourceFactory的子类
  所以这里就依赖于Catalog的发现,Catalog的发现在创建Catalog的时候,FactoryUtil的createCatalog方法,因为Iceberg放在classpath下面,所以走非异常分支
  发现实现类的过程首先是查找classpath下的TableFactory的子类,Iceberg的FlinkCatalogFactory是它的子类,所以可以找到,Hive只有2.2实现了这个,所以这一步无法找到

private static List<TableFactory> discoverFactories(Optional<ClassLoader> classLoader) {
    try {
        List<TableFactory> result = new LinkedList<>();
        ClassLoader cl = classLoader.orElse(Thread.currentThread().getContextClassLoader());
        ServiceLoader.load(TableFactory.class, cl).iterator().forEachRemaining(result::add);
        return result;
    } catch (ServiceConfigurationError e) {
        LOG.error("Could not load service provider for table factories.", e);
        throw new TableException("Could not load service provider for table factories.", e);
    }
}

  这里找到一个列表,之后再根据两个内容做过滤,第一步基于是否CatalogFactory子类,目前自带的只有Hive、Jdbc和默认的内存catalog,这三个都不在上一个find的列表里,所以这一步过滤以后,只剩下了Iceberg的Catalog个

List<T> classFactories = filterByFactoryClass(factoryClass, properties, foundFactories);

List<T> contextFactories = filterByContext(factoryClass, properties, classFactories);
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值