Flink源码剖析:自定义TableSink与TableFactory定位过程剖析

1. 背景

现有自定义redis sink的需求,我们借此学习一下如何自定义flink sink connector,以及flink是如何通过ddl建表语句中的properties来定位具体的TableFactory,进而创建StreamTableSink的。

该文介绍的写法是Flink1.10之前的写法,Flink1.11之后的版本在兼容该文所介绍的写法的同时,对用户自定义Source和Sink相关逻辑进行了重构,详情请参考官方文档。

2. user-defined redis sink

文中代码详情请参考github

如果只是利用DataStream API定义一个sink connector,则实现SinkFunction接口,然后调用datastream.addSink(redisSink)即可,详情参考源码中的RedisSinkITCase#testRedisListDataType测试用例。

如果想要在Table API / SQL DDL中使用自定义sink connector,则需要依次实现
TableSinkFactory -> TableSink -> SinkFunction

2.1 RedisTableSinkFactory

Flink使用SPI查找到ClassPath下所有的TableSinkFactory class对象(下节详细介绍),并通过DDL中定义的properties来查找需要实例化的具体TableSinkFactory。

/**
 * redis table sink factory for creare redis table sink.
 */
public class RedisTableSinkFactory implements StreamTableSinkFactory<Tuple2<Boolean, Row>> {

	// 将DDL中的属性解析成入参properties,并创建TableSink
    @Override
    public StreamTableSink<Tuple2<Boolean, Row>> createStreamTableSink(Map<String, String> properties) {
        return new RedisTableSink(properties);
    }

	// 根据这里定义的属性,从众多TableSinkFactory实例中精确定位需要使用的TableSinkFactory
    @Override
    public Map<String, String> requiredContext() {
        Map<String, String> require = new HashMap<>();
        require.put(CONNECTOR_TYPE, REDIS);
        return require;
    }

	// 预先定义可以写在DDL中的properties,如果在TableSinkFactory查找过程中,发现DDL中properties全部属于在此定义的properties,则认为该TableFactory即为要使用的TableFactory
	// 可以认为,supportedProperties()方法中的properties和requiredContext()方法中的properties都是用来精确定位TableSinkFactory实现类的
    @Override
    public List<String> supportedProperties() {
        List<String> properties = new ArrayList<>();
        properties.add(REDIS_MODE);
        properties.add(REDIS_COMMAND);
        properties.add(REDIS_NODES);
        properties.add(REDIS_MASTER_NAME);
        properties.add(REDIS_SENTINEL);
        properties.add(REDIS_KEY_TTL);
        // schema
        properties.add(SCHEMA + ".#." + SCHEMA_DATA_TYPE);
        properties.add(SCHEMA + ".#." + SCHEMA_NAME);
        // format wildcard
        properties.add(CONNECTOR + ".*");
        // standalone
        properties.add(REDIS_SERVER_IP);
        properties.add(REDIS_SERVER_PORT);
        return properties;
    }
}

2.2 RedisTableSink

通过SPI定位到具体TableFactory之后,会创建TableSink并消费上游的流数据。

public class RedisTableSink implements UpsertStreamTableSink<Row> {
...
	// 创建SinkFunction
    @Override
    public DataStreamSink<?> consumeDataStream(DataStream<Tuple2<Boolean, Row>> dataStream) {
        return dataStream.addSink(new RedisSink(flinkJedisConfigBase, redisMapper));
    }
...
}
2.3 RedisSink

Sink Connector的具体实现逻辑定义在RichSinkFunction中。

public class RedisSink<IN> extends RichSinkFunction<IN> {
	// 算子运行前的初始化方法,仅调用一次
    @Override
    public void open(Configuration parameters) throws Exception {
        try {
        	// 创建redis客户端
            this.redisCommandsContainer = RedisCommandsContainerBuilder.build(this.flinkJedisConfigBase);
            // 测试redis连通性
            this.redisCommandsContainer.open();
        } catch (Exception e) {
            LOG.error("Redis has not been properly initialized: ", e);
            throw e;
        }
    }

	// 每来一条数据都会调用此方法进行处理并插入sink数据源
    @Override
    public void invoke(IN input, Context context) throws Exception {
        String key = redisSinkMapper.getKeyFromData(input);
        String value = redisSinkMapper.getValueFromData(input);

        Optional<String> optAdditionalKey = redisSinkMapper.getAdditionalKey(input);
        Optional<Integer> optAdditionalTTL = redisSinkMapper.getAdditionalTTL(input);

        switch (redisCommand) {
        	case RPUSH:
                this.redisCommandsContainer.rpush(key, value);
                break;
            case LPUSH:
                this.redisCommandsContainer.lpush(key, value);
                break;
			case SET:
                this.redisCommandsContainer.set(key, value);
                break;
            ...
		}
    }
}

3. TableFactory定位

TableFactory的查找使用了SPI技术,因此用户自定义TableSinkFactory的时候需要在META-INF/services下添加文件org.apache.flink.table.factories.TableFactory并将实现类的全限定名写在其中。

定位TableFactory的源码入口如下:

public class TableFactoryUtil {
	// 寻找并创建TableSink
	@SuppressWarnings("unchecked")
	public static <T> TableSink<T> findAndCreateTableSink(TableSinkFactory.Context context) {
		try {
			// context.getTable().toProperties()解析自DDL
			return TableFactoryService
					.find(TableSinkFactory.class, context.getTable().toProperties())
					.createTableSink(context);
		} catch (Throwable t) {
			throw new TableException("findAndCreateTableSink failed.", t);
		}
	}
}
public class TableFactoryService {
	public static <T extends TableFactory> T find(Class<T> factoryClass, Map<String, String> propertyMap) {
		return findSingleInternal(factoryClass, propertyMap, Optional.empty());
	}

	private static <T extends TableFactory> T findSingleInternal(
			Class<T> factoryClass,
			Map<String, String> properties,
			Optional<ClassLoader> classLoader) {
		// 使用SPI加载所有TableFactory
		List<TableFactory> tableFactories = discoverFactories(classLoader);
		// 根据DDL中的properties过滤出具体的TableFactory实现类
		List<T> filtered = filter(tableFactories, factoryClass, properties);
		
		// 如果查出满足条件的TableFactory实现类有多个,则抛异常
		if (filtered.size() > 1) {
			throw new AmbiguousTableFactoryException(
				filtered,
				factoryClass,
				tableFactories,
				properties);
		} else {
			return filtered.get(0);
		}
	}

	private static List<TableFactory> discoverFactories(Optional<ClassLoader> classLoader) {
		try {
			List<TableFactory> result = new LinkedList<>();
			// 线程上下文类加载,默认为AppClassLoader
			ClassLoader cl = classLoader.orElse(Thread.currentThread().getContextClassLoader());
			// 利用SPI查找并实例化所有TableFactory
			ServiceLoader
				.load(TableFactory.class, cl)
				.iterator()
				.forEachRemaining(result::add);
			return result;
		} catch (ServiceConfigurationError e) {
			LOG.error("Could not load service provider for table factories.", e);
			throw new TableException("Could not load service provider for table factories.", e);
		}

	}

	// 根据DDL中的properties过滤出具体的TableFactory实现类
	private static <T extends TableFactory> List<T> filter(
			List<TableFactory> foundFactories,
			Class<T> factoryClass,
			Map<String, String> properties) {

		Preconditions.checkNotNull(factoryClass);
		Preconditions.checkNotNull(properties);

		// 当factoryClass==TableSinkFactory.class,该方法可以过滤TableSourceFactory,只留下TableSinkFactory
		List<T> classFactories = filterByFactoryClass(
			factoryClass,
			properties,
			foundFactories);
		// 根据TableFactory#requiredContext返回的properties定位TableFactory的实现类
		// 详情自己参考源码
		List<T> contextFactories = filterByContext(
			factoryClass,
			properties,
			classFactories);
		// 判断DDL中解析出的properties是否都在TableFactory#supportedProperties()中	
		// 详情自己参考源码
		return filterBySupportedProperties(
			factoryClass,
			properties,
			classFactories,
			contextFactories);
	}

	private static <T extends TableFactory> List<T> filterBySupportedProperties(
			Class<T> factoryClass,
			Map<String, String> properties,
			List<T> classFactories,
			List<T> contextFactories) {

		final List<String> plainGivenKeys = new LinkedList<>();
		properties.keySet().forEach(k -> {
			// replace arrays with wildcard
			String key = k.replaceAll(".\\d+", ".#");
			// ignore duplicates
			if (!plainGivenKeys.contains(key)) {
				// plainGivenKeys表示来自DDL解析的properties
				plainGivenKeys.add(key);
			}
		});

		List<T> supportedFactories = new LinkedList<>();
		Tuple2<T, List<String>> bestMatched = null;
		for (T factory: contextFactories) {
			// requiredContextKeys表示来自RedisTableSinkFactory#requiredContext的properties
			Set<String> requiredContextKeys = normalizeContext(factory).keySet();
			
			// tuple2.f0表示来自RedisTableSinkFactory#supportedProperties的properties
			// tuple2.f1表示来自RedisTableSinkFactory#supportedProperties中的.*相关的properties,例如connector.*
			Tuple2<List<String>, List<String>> tuple2 = normalizeSupportedProperties(factory);
			// ignore context keys
			List<String> givenContextFreeKeys = plainGivenKeys.stream()
				.filter(p -> !requiredContextKeys.contains(p))
				.collect(Collectors.toList());
			List<String> givenFilteredKeys = filterSupportedPropertiesFactorySpecific(
				factory,
				givenContextFreeKeys);

			boolean allTrue = true;
			List<String> unsupportedKeys = new ArrayList<>();
			// 判断来自DDL的properties是否都在TableSinkFactory中定义的properties中
			for (String k : givenFilteredKeys) {
				if (!(tuple2.f0.contains(k) || tuple2.f1.stream().anyMatch(k::startsWith))) {
					allTrue = false;
					unsupportedKeys.add(k);
				}
			}
			// 如果来自DDL的properties是否都在TableSinkFactory中定义的properties中,则加入需要返回的列表,否则,过滤掉
			if (allTrue) {
				supportedFactories.add(factory);
			} else {
				if (bestMatched == null || unsupportedKeys.size() < bestMatched.f1.size()) {
					bestMatched = new Tuple2<>(factory, unsupportedKeys);
				}
			}
		}

		if (supportedFactories.isEmpty()) {
			String bestMatchedMessage = null;
			if (bestMatched != null) {
				bestMatchedMessage = String.format(
						"%s\nUnsupported property keys:\n%s",
						bestMatched.f0.getClass().getName(),
						String.join("\n", bestMatched.f1)
				);
			}

			//noinspection unchecked
			throw new NoMatchingTableFactoryException(
				"No factory supports all properties.",
				bestMatchedMessage,
				factoryClass,
				(List<TableFactory>) classFactories,
				properties);
		}

		return supportedFactories;
	}
}

参考:
https://ci.apache.org/projects/flink/flink-docs-release-1.9/zh/dev/table/sourceSinks.html#defining-a-streamtablesource

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值