flink图表 sql query流程图

1、分析sql query我们以tpc-ds生成的query1.sql为例子进行分析。Query1.sql的流程图如下所示,为了方便说具体那个方块,我给他们编了编号,(从外部读取文件数据的方块都没有编号,不具备详细说明的价值)。Query1.sql的具体语句也贴在了下面。

2、1方块detail: CsvTableSource(read fields: sr_returned_date_sk, sr_customer_sk, sr_store_sk, sr_return_amt) -> SourceConversion(table:Buffer(default_catalog, default_database, store_returns, source: [CsvTableSource(read fields: sr_returned_date_sk, sr_customer_sk, sr_store_sk, sr_return_amt)]), fields:(sr_returned_date_sk, sr_customer_sk, sr_store_sk, sr_return_amt))。1方块读取了字段名称:store_returns里的四个字段(不是全部字段),仅读取了必要字段。

3、2方块中的detail显示如下:CsvTableSource(read fields: d_date_sk, d_year) -> SourceConversion(table:Buffer(default_catalog, default_database, date_dim, source: [CsvTableSource(read fields: d_date_sk, d_year)]), fields:(d_date_sk, d_year)) -> Calc(where: (d_year = 1998:BIGINT), select: (d_date_sk))读取了date_dim中的d_date_sk,d_year字段,并且进行了筛选d_year=1998(query1的第9行)。

4、3方块中与1方块基本相同,detail如下:CsvTableSource(read fields: sr_returned_date_sk, sr_customer_sk, sr_store_sk, sr_return_amt) -> SourceConversion(table:Buffer(default_catalog, default_database, store_returns, source: [CsvTableSource(read fields: sr_returned_date_sk, sr_customer_sk, sr_store_sk, sr_return_amt)]), fields:(sr_returned_date_sk, sr_customer_sk, sr_store_sk, sr_return_amt)) -> Calc(where: sr_store_sk IS NOT NULL, select: (sr_returned_date_sk, sr_customer_sk, sr_store_sk, sr_return_amt)),唯一的不同在于它确认了sr_store_sk不为空值。、

5、4方块由1、2而来,detail如下:HashJoin(where: (sr_returned_date_sk = d_date_sk), buildRight) -> Calc(select: (sr_customer_sk AS ctr_customer_sk, sr_store_sk AS ctr_store_sk, sr_return_amt))。Hashjoin是它使用一个内部混合散列连接以匹配具有相同键值的记录。散列的构建端是匹配项的第一个输入。筛选条件为sr_returned_date_sk=d_date_sk,然后将sr_customer_sk记为ctr_customer_sk,sr_store_sk记为ctr_store_sk,sr_return_amt,这三个字段进行存储。

6、5方块由2、3而来,与4方块一模一样(具体原因不是因为冗余操作,是因为query1中有一句语句为ctrl1.ctr_store_sk=ctrl2.ctr_store_sk).

7、6方块的detail显示如下:CsvTableSource(read fields: s_store_sk, s_state) -> SourceConversion(table:Buffer(default_catalog, default_database, store, source: [CsvTableSource(read fields: s_store_sk, s_state)]), fields:(s_store_sk, s_state)) -> Calc(where: (s_state = _UTF-16LE'TN':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), select: (s_store_sk)),读取了store中的s_store_sk和s_state,筛选条件为s_state=”TN”。

8、7方块由4而来,detail如下:CompleteHashAggregate(groupBy:(ctr_customer_sk, ctr_store_sk),select:(ctr_customer_sk, ctr_store_sk, SUM(sr_return_amt) AS ctr_total_return),)根据ctr_customer_sk,ctr_store_sk机型groupby操作()。然后筛选出ctr_customer,ctr_store_sk,把sr_return_amt的和记为ctr_total_retrn。(注意这里的sum是根据groupby结果进行的sum求和)(这一步完成了with中的语句的所有操作)

9、8方块由5而来,detail如下:CompleteHashAggregate(groupBy:(ctr_customer_sk, ctr_store_sk),select:(ctr_customer_sk, ctr_store_sk, SUM(sr_return_amt) AS ctr_total_return),) -> Calc(select: (ctr_store_sk, ctr_total_return)) -> LocalHashAggregate(groupBy:(ctr_store_sk),select:(ctr_store_sk, Partial_AVG(ctr_total_return) AS (sum$0, count$1)),),前半部分与7完全相同,然后他还计算了ctr_total_return的平均值。

10、9方块由6、7而来,detail如下:NestedLoopJoin(where: (s_store_sk = ctr_store_sk), buildRight) -> Calc(select: (ctr_customer_sk, ctr_store_sk, ctr_total_return))。筛选条件为s_store_sk=ctr_store_sk,储存ctr_customer_sk, ctr_store_sk, ctr_total_return。

11、10方块由8而来,detail如下:GlobalHashAggregate(groupBy:(ctr_store_sk),select:(ctr_store_sk, Final_AVG(sum$0, count$1) AS $f1),)。Final_AVG(sum$0, count$1)=9中的Partial_AVG(ctr_total_return) AS (sum$0, count$1)。

12、11方块由9、10而来,detail:NestedLoopJoin(where: ((ctr_total_return > ($f1 * 1.2:DECIMAL(2, 1))) AND (ctr_store_sk = ctr_store_sk0)), buildRight) -> Calc(select: (ctr_customer_sk)),执行了大部分query1语句where中的操作,筛选了avg(ctr_total_return)*1.2<ctr_total_return和ctr_store_sk = ctr_store_sk0两个条件。存储ctr_customer_sk。

13、12方块detail:CsvTableSource(read fields: c_customer_sk, c_customer_id) -> SourceConversion(table:Buffer(default_catalog, default_database, customer, source: [CsvTableSource(read fields: c_customer_sk, c_customer_id)]), fields:(c_customer_sk, c_customer_id))。读取了customer的c_customer_sk, c_customer_id。

14、13方块由11、12而来,detail:HashJoin(where: (ctr_customer_sk = c_customer_sk), buildRight) -> LocalSortLimit(orderBy: [c_customer_id ASC], offset: 0, fetch: 100)。筛选ctr_customer_sk = c_customer_sk,并且以limit100,orderby c_customer_id ASC的方式输出存储。

15、14方块由13而来,detail:GlobalSortLimit(orderBy: [c_customer_id ASC], offset: 0, fetch: 100) -> Calc(select: (c_customer_id)) -> SinkConversionToRow -> Sink: collect,select出c_customer_id并sink出。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值