Flink SQL:Queries(Joins)

本文介绍了Flink SQL中的各种Join操作,包括常规Join、间隔Join、临时Join(事件时间和处理时间)以及Lookup Join。Flink支持复杂的join操作,但需要注意不同类型的join在性能和状态管理上的差异,如常规Join可能导致无限增长的状态,而间隔Join适用于时间约束的场景。此外,还提到了Temporal Table Function Join和Lookup Join,它们分别用于连接动态表和外部系统的数据。
摘要由CSDN通过智能技术生成

Joins

Batch Streaming

Flink SQL supports complex and flexible join operations over dynamic tables. There are several different types of joins to account for the wide variety of semantics queries may require.
Flink SQL支持动态表上复杂而灵活的join操作。有几种不同类型的join来解释可能需要的各种语义查询。

By default, the order of joins is not optimized. Tables are joined in the order in which they are specified in the FROM clause. You can tweak the performance of your join queries, by listing the tables with the lowest update frequency first and the tables with the highest update frequency last. Make sure to specify tables in an order that does not yield a cross join (Cartesian product), which are not supported and would cause a query to fail.
默认情况下,join顺序未优化。表按照FROM子句中指定的顺序join。您可以调整join查询的性能,方法是首先列出更新频率最低的表,然后列出更新频率最高的表。请确保以不产生交叉join(笛卡尔积)的顺序指定表,交叉join是不受支持的,会导致查询失败。

Regular Joins

Regular joins are the most generic type of join in which any new record, or changes to either side of the join, are visible and affect the entirety of the join result. For example, if there is a new record on the left side, it will be joined with all the previous and future records on the right side when the product id equals.
常规join是最通用的join类型,其中任何新记录或join任一侧的更改都是可见的,并且会影响整个join结果。例如,如果左侧有一条新记录,当product id相等时,它将与右侧所有先前和未来的记录join。

SELECT * FROM Orders
INNER JOIN Product
ON Orders.productId = Product.id

For streaming queries, the grammar of regular joins is the most flexible and allow for any kind of updating (insert, update, delete) input table. However, this operation has important operational implications: it requires to keep both sides of the join input in Flink state forever. Thus, the required state for computing the query result might grow infinitely depending on the number of distinct input rows of all input tables and intermediate join results. You can provide a query configuration with an appropriate state time-to-live (TTL) to prevent excessive state size. Note that this might affect the correctness of the query result. See query configuration for details.
对于流式查询,常规join的语法是最灵活的,允许任何类型的更新(insert, update, delete)输入表。然而,此操作具有重要的操作含义:它需要将join输入的两侧永远保持在Flink状态。因此,根据所有输入表和中间join结果的不同输入行数,计算查询结果所需的状态可能会无限增长。您可以为查询配置提供适当的状态生存时间(TTL),以防止状态大小过大。请注意,这可能会影响查询结果的正确性。有关详细信息,请参阅查询配置。

For streaming queries the required state to compute the query result might grow infinitely depending on the type of aggregation and the number of distinct grouping keys. Please provide an idle state retention time to prevent excessive state size. See Idle State Retention Time for details.
对于流式查询,根据聚合类型和不同分组键的数量,计算查询结果所需的状态可能会无限增长。请提供空闲状态保留时间以防止状态大小过大。有关详细信息,请参阅空闲状态保留时间。

INNER Equi-JOIN

Returns a simple Cartesian product restricted by the join condition. Currently, only equi-joins are supported, i.e., joins that have at least one conjunctive condition with an equality predicate. Arbitrary cross or theta joins are not supported.
返回受join条件限制的简单笛卡尔积。目前,只支持等式连接,即至少有一个使用等式谓词的连接条件。不支持任意交叉或θ join。

SELECT *
FROM Orders
INNER JOIN Product
ON Orders.product_id = Product.id

OUTER Equi-JOIN

Returns all rows in the qualified Cartesian product (i.e., all combined rows that pass its join condition), plus one copy of each row in an outer table for which the join condition did not match with any row of the other table. Flink supports LEFT, RIGHT, and FULL outer joins. Currently, only equi-joins are supported, i.e., joins with at least one conjunctive condition with an equality predicate. Arbitrary cross or theta joins are not supported.
返回合格笛卡尔积中的所有行(即通过其join条件的所有联结行),以及外部表中join条件与其他表的任何行不匹配的每一行的一个副本。Flink支持LEFT、RIGHT和FULL outer joins。目前,只支持等式联接,即至少有一个使用等式谓词的join条件。不支持任意交叉或θ连接。

SELECT *
FROM Orders
LEFT JOIN Product
ON Orders.product_id = Product.id

SELECT *
FROM Orders
RIGHT JOIN Product
ON Orders.product_id = Product.id

SELECT *
FROM Orders
FULL OUTER JOIN Product
ON Orders.product_id = Product.id

Interval Joins

Returns a simple Cartesian product restricted by the join condition and a time constraint. An interval join requires at least one equi-join predicate and a join condition that bounds the time on both sides. Two appropriate range predicates can define such a condition (<, <=, >=, >), a BETWEEN predicate, or a single equality predicate that compares time attributes of the same type (i.e., processing time or event time) of both input tables.
返回受连接条件和时间约束限制的简单笛卡尔积。间隔连接至少需要一个相等连接谓词和一个限制双方时间的连接条件。两个适当的范围谓词可以这样定义: (<, <=, >=, >)、BETWEE谓词或一个比较两个输入表的相同类型(即处理时间或事件时间)的时间属性的相等谓词。

For example, this query will join all orders with their corresponding shipments if the order was shipped four hours after the order was received.
例如,如果在收到订单四小时后发货,此查询将连接所有订单及其相应的发货。

SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.order_id
AND o.order_time BETWEEN s.ship_time - INTERVAL '4' HOUR AND s.ship_time

The following predicates are examples of valid interval join conditions:
以下谓词是有效的interval join条件的示例:

  • ltime = rtime
  • ltime >= rtime AND ltime < rtime + INTERVAL ‘10’ MINUTE
  • ltime BETWEEN rtime - INTERVAL ‘10’ SECOND AND rtime + INTERVAL ‘5’ SECOND
  • </
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值