文章目录
1. 背景
现有自定义redis sink的需求,我们借此学习一下如何自定义flink sink connector,以及flink是如何通过ddl
建表语句中的properties
来定位具体的TableFactory
,进而创建StreamTableSink
的。
该文介绍的写法是Flink1.10之前的写法,Flink1.11之后的版本在兼容该文所介绍的写法的同时,对用户自定义Source和Sink相关逻辑进行了重构,详情请参考官方文档。
2. user-defined redis sink
文中代码详情请参考github。
如果只是利用DataStream API定义一个sink connector,则实现SinkFunction
接口,然后调用datastream.addSink(redisSink)
即可,详情参考源码中的RedisSinkITCase#testRedisListDataType
测试用例。
如果想要在Table API / SQL DDL中使用自定义sink connector,则需要依次实现
TableSinkFactory
-> TableSink
-> SinkFunction
。
2.1 RedisTableSinkFactory
Flink使用SPI查找到ClassPath下所有的TableSinkFactory
class对象(下节详细介绍),并通过DDL中定义的properties
来查找需要实例化的具体TableSinkFactory。
/**
* redis table sink factory for creare redis table sink.
*/
public class RedisTableSinkFactory implements StreamTableSinkFactory<Tuple2<Boolean, Row>> {
// 将DDL中的属性解析成入参properties,并创建TableSink
@Override
public StreamTableSink<Tuple2<Boolean, Row>> createStreamTableSink(Map<String, String> properties) {
return new RedisTableSink(properties);
}
// 根据这里定义的属性,从众多TableSinkFactory实例中精确定位需要使用的TableSinkFactory
@Override
public Map<String, String> requiredContext() {
Map<String, String> require = new HashMap<>();
require.put(CONNECTOR_TYPE, REDIS);
return require;
}
// 预先定义可以写在DDL中的properties,如果在TableSinkFactory查找过程中,发现DDL中properties全部属于在此定义的properties,则认为该TableFactory即为要使用的TableFactory
// 可以认为,supportedProperties()方法中的properties和requiredContext()方法中的properties都是用来精确定位TableSinkFactory实现类的
@Override
public List<String> supportedProperties() {
List<String> properties = new ArrayList<>();
properties.add(REDIS_MODE);
properties.add(REDIS_COMMAND);
properties.add(REDIS_NODES);
properties.add(REDIS_MASTER_NAME);
properties.add(REDIS_SENTINEL);
properties.add(REDIS_KEY_TTL);
// schema
properties.add(SCHEMA + ".#." + SCHEMA_DATA_TYPE);
properties.add(SCHEMA + ".#." + SCHEMA_NAME);
// format wildcard
properties.add(CONNECTOR + ".*");
// standalone
properties.add(REDIS_SERVER_IP);
properties.add(REDIS_SERVER_PORT);
return properties;
}
}
2.2 RedisTableSink
通过SPI定位到具体TableFactory之后,会创建TableSink并消费上游的流数据。
public class RedisTableSink implements UpsertStreamTableSink<Row> {
...
// 创建SinkFunction
@Override
public DataStreamSink<?> consumeDataStream(DataStream<Tuple2<Boolean, Row>> dataStream) {
return dataStream.addSink(new RedisSink(flinkJedisConfigBase, redisMapper));
}
...
}
2.3 RedisSink
Sink Connector的具体实现逻辑定义在RichSinkFunction中。
public class RedisSink<IN> extends RichSinkFunction<IN> {
// 算子运行前的初始化方法,仅调用一次
@Override
public void open(Configuration parameters) throws Exception {
try {
// 创建redis客户端
this.redisCommandsContainer = RedisCommandsContainerBuilder.build(this.flinkJedisConfigBase);
// 测试redis连通性
this.redisCommandsContainer.open();
} catch (Exception e) {
LOG.error("Redis has not been properly initialized: ", e);
throw e;
}
}
// 每来一条数据都会调用此方法进行处理并插入sink数据源
@Override
public void invoke(IN input, Context context) throws Exception {
String key = redisSinkMapper.getKeyFromData(input);
String value = redisSinkMapper.getValueFromData(input);
Optional<String> optAdditionalKey = redisSinkMapper.getAdditionalKey(input);
Optional<Integer> optAdditionalTTL = redisSinkMapper.getAdditionalTTL(input);
switch (redisCommand) {
case RPUSH:
this.redisCommandsContainer.rpush(key, value);
break;
case LPUSH:
this.redisCommandsContainer.lpush(key, value);
break;
case SET:
this.redisCommandsContainer.set(key, value);
break;
...
}
}
}
3. TableFactory定位
TableFactory的查找使用了SPI技术,因此用户自定义TableSinkFactory的时候需要在META-INF/services下添加文件org.apache.flink.table.factories.TableFactory
并将实现类的全限定名写在其中。
定位TableFactory的源码入口如下:
public class TableFactoryUtil {
// 寻找并创建TableSink
@SuppressWarnings("unchecked")
public static <T> TableSink<T> findAndCreateTableSink(TableSinkFactory.Context context) {
try {
// context.getTable().toProperties()解析自DDL
return TableFactoryService
.find(TableSinkFactory.class, context.getTable().toProperties())
.createTableSink(context);
} catch (Throwable t) {
throw new TableException("findAndCreateTableSink failed.", t);
}
}
}
public class TableFactoryService {
public static <T extends TableFactory> T find(Class<T> factoryClass, Map<String, String> propertyMap) {
return findSingleInternal(factoryClass, propertyMap, Optional.empty());
}
private static <T extends TableFactory> T findSingleInternal(
Class<T> factoryClass,
Map<String, String> properties,
Optional<ClassLoader> classLoader) {
// 使用SPI加载所有TableFactory
List<TableFactory> tableFactories = discoverFactories(classLoader);
// 根据DDL中的properties过滤出具体的TableFactory实现类
List<T> filtered = filter(tableFactories, factoryClass, properties);
// 如果查出满足条件的TableFactory实现类有多个,则抛异常
if (filtered.size() > 1) {
throw new AmbiguousTableFactoryException(
filtered,
factoryClass,
tableFactories,
properties);
} else {
return filtered.get(0);
}
}
private static List<TableFactory> discoverFactories(Optional<ClassLoader> classLoader) {
try {
List<TableFactory> result = new LinkedList<>();
// 线程上下文类加载,默认为AppClassLoader
ClassLoader cl = classLoader.orElse(Thread.currentThread().getContextClassLoader());
// 利用SPI查找并实例化所有TableFactory
ServiceLoader
.load(TableFactory.class, cl)
.iterator()
.forEachRemaining(result::add);
return result;
} catch (ServiceConfigurationError e) {
LOG.error("Could not load service provider for table factories.", e);
throw new TableException("Could not load service provider for table factories.", e);
}
}
// 根据DDL中的properties过滤出具体的TableFactory实现类
private static <T extends TableFactory> List<T> filter(
List<TableFactory> foundFactories,
Class<T> factoryClass,
Map<String, String> properties) {
Preconditions.checkNotNull(factoryClass);
Preconditions.checkNotNull(properties);
// 当factoryClass==TableSinkFactory.class,该方法可以过滤TableSourceFactory,只留下TableSinkFactory
List<T> classFactories = filterByFactoryClass(
factoryClass,
properties,
foundFactories);
// 根据TableFactory#requiredContext返回的properties定位TableFactory的实现类
// 详情自己参考源码
List<T> contextFactories = filterByContext(
factoryClass,
properties,
classFactories);
// 判断DDL中解析出的properties是否都在TableFactory#supportedProperties()中
// 详情自己参考源码
return filterBySupportedProperties(
factoryClass,
properties,
classFactories,
contextFactories);
}
private static <T extends TableFactory> List<T> filterBySupportedProperties(
Class<T> factoryClass,
Map<String, String> properties,
List<T> classFactories,
List<T> contextFactories) {
final List<String> plainGivenKeys = new LinkedList<>();
properties.keySet().forEach(k -> {
// replace arrays with wildcard
String key = k.replaceAll(".\\d+", ".#");
// ignore duplicates
if (!plainGivenKeys.contains(key)) {
// plainGivenKeys表示来自DDL解析的properties
plainGivenKeys.add(key);
}
});
List<T> supportedFactories = new LinkedList<>();
Tuple2<T, List<String>> bestMatched = null;
for (T factory: contextFactories) {
// requiredContextKeys表示来自RedisTableSinkFactory#requiredContext的properties
Set<String> requiredContextKeys = normalizeContext(factory).keySet();
// tuple2.f0表示来自RedisTableSinkFactory#supportedProperties的properties
// tuple2.f1表示来自RedisTableSinkFactory#supportedProperties中的.*相关的properties,例如connector.*
Tuple2<List<String>, List<String>> tuple2 = normalizeSupportedProperties(factory);
// ignore context keys
List<String> givenContextFreeKeys = plainGivenKeys.stream()
.filter(p -> !requiredContextKeys.contains(p))
.collect(Collectors.toList());
List<String> givenFilteredKeys = filterSupportedPropertiesFactorySpecific(
factory,
givenContextFreeKeys);
boolean allTrue = true;
List<String> unsupportedKeys = new ArrayList<>();
// 判断来自DDL的properties是否都在TableSinkFactory中定义的properties中
for (String k : givenFilteredKeys) {
if (!(tuple2.f0.contains(k) || tuple2.f1.stream().anyMatch(k::startsWith))) {
allTrue = false;
unsupportedKeys.add(k);
}
}
// 如果来自DDL的properties是否都在TableSinkFactory中定义的properties中,则加入需要返回的列表,否则,过滤掉
if (allTrue) {
supportedFactories.add(factory);
} else {
if (bestMatched == null || unsupportedKeys.size() < bestMatched.f1.size()) {
bestMatched = new Tuple2<>(factory, unsupportedKeys);
}
}
}
if (supportedFactories.isEmpty()) {
String bestMatchedMessage = null;
if (bestMatched != null) {
bestMatchedMessage = String.format(
"%s\nUnsupported property keys:\n%s",
bestMatched.f0.getClass().getName(),
String.join("\n", bestMatched.f1)
);
}
//noinspection unchecked
throw new NoMatchingTableFactoryException(
"No factory supports all properties.",
bestMatchedMessage,
factoryClass,
(List<TableFactory>) classFactories,
properties);
}
return supportedFactories;
}
}
参考:
https://ci.apache.org/projects/flink/flink-docs-release-1.9/zh/dev/table/sourceSinks.html#defining-a-streamtablesource