paimon-Lookup Joins

官方解释

Lookup Joins

Lookup Joins are a type of join in streaming queries. It is used to enrich a table with data that is queried from Paimon. The join requires one table to have a processing time attribute and the other table to be backed by a lookup source connector.

数据准备

//customers
{"id":1,"name":"001","country":"中国","zip":"zip01"}
{"id":2,"name":"002","country":"加拿大","zip":"zip02"}
{"id":3,"name":"003","country":"新加坡","zip":"zip03"}
{"id":4,"name":"004","country":"日本","zip":"zip04"}
{"id":5,"name":"005","country":"英国","zip":"zip05"}
{"id":6,"name":"006","country":"法国","zip":"zip06"}
//Orders
{"order_id":10001,"total":1000001,"customer_id":1}
{"order_id":10002,"total":1000002,"customer_id":2}
{"order_id":10003,"total":1000003,"customer_id":3}
{"order_id":10004,"total":1000004,"customer_id":4}
{"order_id":10005,"total":1000005,"customer_id":5}
{"order_id":10006,"total":1000006,"customer_id":6}

DDL、DML

CREATE TABLE if not exists  customers (
    id INT PRIMARY KEY NOT ENFORCED,
    name STRING,
    country STRING,
    zip STRING,
    create_time TIMESTAMP_LTZ(3) -- now()
);
insert into customers (
	id,
	name,
	country,
	zip,
	create_time) 
select 
	id,
	name,
	country,
	zip,
	now() 
from InputTable;
CREATE TEMPORARY TABLE Orders (
	order_id INT,
	total INT,
	customer_id INT,
	order_time String, -- now()
	proc_time AS PROCTIME()
) WITH (
'connector' = 'kafka',
'topic' = 'test-paimon',
'properties.bootstrap.servers' = 'localhost:9092',
'properties.group.id' = 'testGroup',
'scan.startup.mode' = 'latest-offset',
'format' = 'json'
);

查询

//1:先在维表插入3条数据
+----+-----+------+---------+-------+-------------------------+
| op |  id | name | country |   zip |             create_time |
+----+-----+------+---------+-------+-------------------------+
| +I |   1 |  001 |    中国 | zip01 | 2023-07-07 16:37:24.257 |
| +I |   2 |  002 |  加拿大 | zip02 | 2023-07-07 16:37:24.257 |
| +I |   3 |  003 |  新加坡 | zip03 | 2023-07-07 16:37:26.260 |
+----+-----+------+---------+-------+-------------------------+
/*
查询语句
SELECT o.order_id, 
o.total, 
c.country, 
c.zip,
o.order_time,
c.create_time 
FROM Orders AS o
JOIN customers
FOR SYSTEM_TIME AS OF o.proc_time AS c
ON o.customer_id = c.id
*/
//2:插入数据{"order_id":10001,"total":1000001,"customer_id":1}
//3:查询结果
+----+---------+-------------+---------+-------+--------------------+-------------------------+
| op |order_id |       total | country |   zip |         order_time |             create_time |
+----+---------+-------------+---------+-------+--------------------+-------------------------+
| +I |   10001 |     1000001 |    中国 | zip01 |2023-07-07 16:47:16 | 2023-07-07 16:37:24.257 |
+----+---------+-------------+---------+-------+--------------------+-------------------------+
//4:order 表继续插入数据
/*
{"order_id":10002,"total":1000002,"customer_id":2}
{"order_id":10003,"total":1000003,"customer_id":3}
{"order_id":10004,"total":1000004,"customer_id":4}
{"order_id":10005,"total":1000005,"customer_id":5}
*/
//5:查询结果
+----+----------+---------+---------+-------+---------------------+-------------------------+
| op | order_id |   total | country |   zip |          order_time |             create_time |
+----+----------+---------+---------+-------+---------------------+-------------------------+
| +I |    10001 | 1000001 |    中国 | zip01 | 2023-07-07 16:47:16 | 2023-07-07 16:37:24.257 |
| +I |    10003 | 1000003 |  新加坡 | zip03 | 2023-07-07 16:53:52 | 2023-07-07 16:37:26.260 |
| +I |    10002 | 1000002 |  加拿大 | zip02 | 2023-07-07 16:53:52 | 2023-07-07 16:37:24.257 |
+----+----------+---------+---------+-------+---------------------+-------------------------+
// 6:维表插入数据 4、5
+----+----+------+----------+------+-------------------------+
| op | id | name |  country |  zip |             create_time |
+----+----+------+----------+------+-------------------------+
| +I |  1 |  001 |     中国 |zip01 | 2023-07-07 16:37:24.257 |
| +I |  2 |  002 |   加拿大 |zip02 | 2023-07-07 16:37:24.257 |
| +I |  3 |  003 |   新加坡 |zip03 | 2023-07-07 16:37:26.260 |
| +I |  4 |  004 |     日本 |zip04 | 2023-07-07 16:56:03.695 |
| +I |  5 |  005 |     英国 |zip05 | 2023-07-07 16:56:04.455 |
+----+----+------+----------+------+-------------------------+
//发现步骤5的数据没变化

结论

lookup join 关联不上的数据不会显示,即使后面数据又进到lookup表,同样不会再重新显示
如:id 4,5 两条数据先进到order表,此时customers没有4,5数据关联不上,不会输出4,5 数据。接下来
customers 表插入customer_id 4,5的数据发现控制台无变化

If the records of Orders (main table) join missing because the corresponding data of customers (lookup table) is not ready. You can consider using Flink’s Delayed Retry Strategy For Lookup.

开启 Delayed Retry Strategy For Lookup

SELECT /*+ LOOKUP('table'='c', 'async'='true', 'retry-predicate'='lookup_miss', 'retry-strategy'='fixed_delay', 'fixed-delay'='10s','max-attempts'='3')*/o.order_id, 
o.total, 
c.country, 
c.zip,
o.order_time,
c.create_time 
FROM Orders AS o
JOIN customers
FOR SYSTEM_TIME AS OF o.proc_time AS c
ON o.customer_id = c.id

1:清空数据
2:Orders 表插入数据{“order_id”:10001,“total”:1000001,“customer_id”:1}
3:customers表插入数据{“id”:1,“name”:“001”,“country”:“中国”,“zip”:“zip01”}
4:Orders 表插入数据{“order_id”:10002,“total”:1000002,“customer_id”:2}
5:10s之后customers表插入数据{“id”:2,“name”:“002”,“country”:“加拿大”,“zip”:“zip02”}
6:Orders 表插入数据{“order_id”:10003,“total”:1000003,“customer_id”:3}
7:30s之后customers表插入数据{“id”:3,“name”:“003”,“country”:“新加坡”,“zip”:“zip03”}
8:查询结果

+----+---------+--------+--------+------+--------------------+-------------------------+
| op |order_id |  total |country |  zip |         order_time |             create_time |
+----+---------+--------+--------+------+--------------------+-------------------------+
| +I |   10001 |1000001 |   中国 |zip01 |2023-07-07 17:29:03 | 2023-07-07 17:29:12.981 |
| +I |   10002 |1000002 | 加拿大 |zip02 |2023-07-07 17:30:11 | 2023-07-07 17:30:28.257 |
+----+---------+--------+--------+------+--------------------+-------------------------+

9:结论
在指定Delayed Retry Strategy之后可以在规则内正常处理晚到的数据,按照上述配置两条数据相差超过30s就会被丢弃

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值