1.基础
Flink提供了基础的Catalog接口,自定义Catalog实现此接口即可
此外还有一个CatalogFactory接口,用于创建Catalog
2.CreateCatalog
走SQL解析流程,在TableEnvironmentImpl.executeInternal()步骤,根据SQL操作类型,选择CreateCatalog分支
} else if (operation instanceof CreateCatalogOperation) {
return createCatalog((CreateCatalogOperation) operation);
CreateCatalogOperation包含两个属性:name和properties
public class CreateCatalogOperation implements CreateOperation {
private final String catalogName;
private final Map<String, String> properties;
在createCatalog方法中,核心就是两步:加载Catalog类、注册Catalog类
Catalog catalog =
FactoryUtil.createCatalog(
catalogName, properties, tableConfig, userClassLoader);
catalogManager.registerCatalog(catalogName, catalog);
加载就是通用的,从classpath查找CatalogFactory的子类,然后调用其createCatalog方法
这边不在classpath的Catalog走的应该是NoMatchingTableFactoryException,由userClassLoader去加载
3.UseCatalog
通用的SQL解析流程,走到executeInternal(),走UseCatalogOperation分支,USE就很简单,直接设置一下catalogManager就行
} else if (operation instanceof UseCatalogOperation) {
UseCatalogOperation useCatalogOperation = (UseCatalogOperation) operation;
catalogManager.setCurrentCatalog(useCatalogOperation.getCatalogName());
return TableResultImpl.TABLE_RESULT_OK;
4.DropCatalog
一样走到executeInternal(),走DropCatalog分支,从catalogManager的列表中删除,是Flink层面的一个逻辑删除
} else if (operation instanceof DropCatalogOperation) {
DropCatalogOperation dropCatalogOperation = (DropCatalogOperation) operation;
String exMsg = getDDLOpExecuteErrorMsg(dropCatalogOperation.asSummaryString());
try {
catalogManager.unregisterCatalog(
dropCatalogOperation.getCatalogName(), dropCatalogOperation.isIfExists());
return TableResultImpl.TABLE_RESULT_OK;
} catch (CatalogException e) {
throw new ValidationException(exMsg, e);
}
5.在Catalog下操作
5.1.CreateDatabase
同样的解析流程到executeInternal,走CreateDatabase分支,最后这个操作的执行就是直接调用的catalog的createDatabase接口
目前MemoryCatalog和HiveCatalog是支持操作的,JdbcCatalog不支持
dropDatabase走一样的流程,调用不同的接口
} else if (operation instanceof CreateDatabaseOperation) {
CreateDatabaseOperation createDatabaseOperation = (CreateDatabaseOperation) operation;
Catalog catalog = getCatalogOrThrowException(createDatabaseOperation.getCatalogName());
String exMsg = getDDLOpExecuteErrorMsg(createDatabaseOperation.asSummaryString());
try {
catalog.createDatabase(
createDatabaseOperation.getDatabaseName(),
createDatabaseOperation.getCatalogDatabase(),
createDatabaseOperation.isIgnoreIfExists());
return TableResultImpl.TABLE_RESULT_OK;
createDatabaseOperation中带了CatalogName,这个是在SqlToOperationConverter步骤做的
} else if (validated instanceof SqlCreateDatabase) {
return Optional.of(converter.convertCreateDatabase((SqlCreateDatabase) validated));
这里如果带了catalog就用,否则就是要默认的
String catalogName =
(fullDatabaseName.length == 1)
? catalogManager.getCurrentCatalog()
: fullDatabaseName[0];
5.2.CreateTable
Table这里有一个Temporary表和普通的表,流程是一样的
} else if (operation instanceof CreateTableOperation) {
CreateTableOperation createTableOperation = (CreateTableOperation) operation;
if (createTableOperation.isTemporary()) {
catalogManager.createTemporaryTable(
createTableOperation.getCatalogTable(),
createTableOperation.getTableIdentifier(),
createTableOperation.isIgnoreIfExists());
} else {
catalogManager.createTable(
createTableOperation.getCatalogTable(),
createTableOperation.getTableIdentifier(),
createTableOperation.isIgnoreIfExists());
}
return TableResultImpl.TABLE_RESULT_OK;
Catalog相关的内容同样是在SqlToOperationConverter步骤做的
} else if (validated instanceof SqlCreateTable) {
return Optional.of(
converter.createTableConverter.convertCreateTable((SqlCreateTable) validated));
identifier这里会带上catalog和database相关的信息
public ObjectIdentifier qualifyIdentifier(UnresolvedIdentifier identifier) {
return ObjectIdentifier.of(
identifier.getCatalogName().orElseGet(this::getCurrentCatalog),
identifier.getDatabaseName().orElseGet(this::getCurrentDatabase),
identifier.getObjectName());
}
5.3.Query
在PlannerBase的translateToRel接口当中,所有的分支都有一个getRelBuilder的使用
case s: UnregisteredSinkModifyOperation[_] =>
val input = getRelBuilder.queryOperation(s.getChild).build()
这是一个方法,里面设置了Catalog和Database。这个封装的比较厉害,最后应该是基于Calcite的内容做的
/** Returns the [[FlinkRelBuilder]] of this TableEnvironment. */
private[flink] def getRelBuilder: FlinkRelBuilder = {
val currentCatalogName = catalogManager.getCurrentCatalog
val currentDatabase = catalogManager.getCurrentDatabase
plannerContext.createRelBuilder(currentCatalogName, currentDatabase)
}
getRelBuilder.queryOperation经过调用链QueryOperationConverter.visit -> FlinkRelBuilder.scan -> CatalogSourceTable.toRel -> createDynamicTableSource,这里用到了Catalog
final Optional<DynamicTableSourceFactory> factoryFromCatalog =
schemaTable
.getContextResolvedTable()
.getCatalog()
.flatMap(Catalog::getFactory)
.map(
f ->
f instanceof DynamicTableSourceFactory
? (DynamicTableSourceFactory) f
: null);
这边Catalog里的factory就是用来创建Table的,以jdbc来说,如下
public DynamicTableSource createDynamicTableSource(Context context) {
final FactoryUtil.TableFactoryHelper helper =
FactoryUtil.createTableFactoryHelper(this, context);
final ReadableConfig config = helper.getOptions();
helper.validate();
validateConfigOptions(config);
validateDataTypeWithJdbcDialect(context.getPhysicalRowDataType(), config.get(URL));
return new JdbcDynamicTableSource(
getJdbcOptions(helper.getOptions()),
getJdbcReadOptions(helper.getOptions()),
getJdbcLookupOptions(helper.getOptions()),
context.getPhysicalRowDataType());
}
Factory目前只有jdbc和hive有,如果Catalog对应没有factory,那就会从classpath找DynamicTableSourceFactory的实现类来做
try {
final DynamicTableSourceFactory factory =
preferredFactory != null
? preferredFactory
: discoverTableFactory(DynamicTableSourceFactory.class, context);
return factory.createDynamicTableSource(context);
这里面最关键的是Context,需要在Context里面置入能识别表的配置,就是connector选项
6.关于Catalog与Table的类型继承关系
FactoryUtil里有connector,表是用connector识别的,Catalog是用type识别的
在FactoryUtil识别表的地方(5.3最后代码片段),首先判断preferredFactory 存在与否,Iceberg这个值是存在的
在CatalogSourceTable的createDynamicTableSource当中,从catalog提取了对应的类
final Optional<DynamicTableSourceFactory> factoryFromCatalog =
schemaTable
.getContextResolvedTable()
.getCatalog()
.flatMap(Catalog::getFactory)
.map(
f ->
f instanceof DynamicTableSourceFactory
? (DynamicTableSourceFactory) f
: null);
核心就是flatMap这里的创建Factory,Iceberg的实现如下
public Optional<Factory> getFactory() {
return Optional.of(new FlinkDynamicTableFactory(this));
}
FlinkDynamicTableFactory是DynamicTableSinkFactory和DynamicTableSourceFactory的子类
所以这里就依赖于Catalog的发现,Catalog的发现在创建Catalog的时候,FactoryUtil的createCatalog方法,因为Iceberg放在classpath下面,所以走非异常分支
发现实现类的过程首先是查找classpath下的TableFactory的子类,Iceberg的FlinkCatalogFactory是它的子类,所以可以找到,Hive只有2.2实现了这个,所以这一步无法找到
private static List<TableFactory> discoverFactories(Optional<ClassLoader> classLoader) {
try {
List<TableFactory> result = new LinkedList<>();
ClassLoader cl = classLoader.orElse(Thread.currentThread().getContextClassLoader());
ServiceLoader.load(TableFactory.class, cl).iterator().forEachRemaining(result::add);
return result;
} catch (ServiceConfigurationError e) {
LOG.error("Could not load service provider for table factories.", e);
throw new TableException("Could not load service provider for table factories.", e);
}
}
这里找到一个列表,之后再根据两个内容做过滤,第一步基于是否CatalogFactory子类,目前自带的只有Hive、Jdbc和默认的内存catalog,这三个都不在上一个find的列表里,所以这一步过滤以后,只剩下了Iceberg的Catalog个
List<T> classFactories = filterByFactoryClass(factoryClass, properties, foundFactories);
List<T> contextFactories = filterByContext(factoryClass, properties, classFactories);