需求背景:
需要流数据与Redis数据进行关联查询。通常情况下Flink可以通过RichMapFunction来创建外部数据系统的client连接,来进行数据的写入和数据的读取。但这种同步查询和操作数据库的方式会因网络等问题影响数据处理的效率,如果想提升效率就要考虑增加算子的并行度,这会增多额外的资源开销。写这个也是网上java实现的资料太少了!
解决思路:
Flink在1.2版本中引入了Asynchronous I/O,能够支持异步的方式连接外部的存储系统,从而提升Flink系统与外部数据库交互的性能与吞吐量,但前提是数据库本身需要支持异步客户端。
解决方案:
1、引入Redis支持异步的客户端
<dependency>
<groupId>io.lettuce</groupId>
<artifactId>lettuce-core</artifactId>
<version>5.0.5.RELEASE</version>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.24.Final</version>
</dependency>
2、构建Flink异步处理的方法
public class RedisSide extends RichAsyncFunction<String,String> {
//构建Redis异步客户端
private RedisClusterClient redisClusterClient;
private StatefulRedisClusterConnection<String,String> connection;
private RedisClusterCommands<String,String> async;
private List<String> nodes;
public RedisSide(List<String> nodes) {
this.nodes = nodes;
}
@Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
//给异步客户端初始化
List<RedisURI> uriList = new ArrayList<>();
nodes.forEach(node -> {
String[] addrStr = node.split(":");
String host = addrStr[0];
int port = Integer.parseInt(addrStr[1]);
RedisURI redisUri = RedisURI.Builder.redis(host).withPort(port).build();
uriList.add(redisUri);
});
RedisClusterClient redisClient = redisClusterClient.create(uriList);
connection = redisClient.connect();
async = connection.sync();
}
@Override
public void close() throws Exception {
super.close();
if (connection != null){connection.close();}
if (redisClusterClient != null){redisClusterClient.shutdown();}
}
//数据处理的方法
@Override
public void asyncInvoke(String input, ResultFuture<String> resultFuture) throws Exception {
//Rdis查询
Map<String, String> hgetall = async.hgetall(input);
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append(input);
//关联判断
CompletableFuture.supplyAsync(new Supplier<Map>() {
@Override
public Map get() {
return hgetall;
}
}).thenAccept(new Consumer<Map>() {
@Override
public void accept(Map map) {
if (map == null || map.size() == 0) {
resultFuture.complete(Collections.singleton(input));
}
map.forEach((key,value) -> {
if ("key2".equals(key)) {
stringBuilder.append(value);
resultFuture.complete(Collections.singleton(stringBuilder.toString()));
}
});
}
});
}
}
3、构建测试代码
public class KafkaAndRedisDemo {
public static void main(String[] args) {
StreamExecutionEnvironment sEnv = StreamExecutionEnvironment.getExecutionEnvironment();
sEnv.setParallelism(1);
String srcBootstrapServers = "kafka集群Ip:端口";
String srcGroupId = "group123";
String Topic1 = "event_topic_1";
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", srcBootstrapServers);
properties.setProperty("group.id", srcGroupId);
FlinkKafkaConsumer010<String> consumer = new FlinkKafkaConsumer010<String>(Topic1, new SimpleStringSchema(), properties);
DataStreamSource<String> source = sEnv.addSource(consumer);
source.print("kafka数据");
String hostAndPort = "Redis集群IP:端口";
List<String> list = Arrays.asList(hostAndPort.split(","));
RedisSide redisSide = new RedisSide(list);
//这里使用的是无序反馈结果的方法,后面两个参数是请求超时时常和时间单位,还有一个最大并发数没有设置,如果超过了最大连接数,Flink会触发反压机制来抑制上游数据的接入,保证程序正常执行
AsyncDataStream.unorderedWait(source,redisSide,5L, TimeUnit.SECONDS).print("与redis匹配的结果");
try {
sEnv.execute();
} catch (Exception e) {
e.printStackTrace();
}
}
}
具体Flink的Asynchronous I/O使用,见另一篇文章
4、准备数据:
redis输入
hmset 1 key1 vaalue1 key2 value2
hmset 2 key1 vaalue3 key2 value4
kafka输入
1
2
3
结果:
1value2
2value2