impala查询优化和explain学习

是因为看到了这篇文章,想自己再继续研究下。

Impala Join策略与执行计划生成作为铺垫,本文首先对Broadcast Join和Partitioned Join进行简要介绍。 Broadcast Join 顾名思义,Broadcast Join就是广播的方式进行Join。以下图为例,假设Join操作为SELECT A JOIN B ON A.idB.id,Broadcast Join就是将B表&#…icon-default.png?t=N2N8https://betheme.net/a/5260296.html?action=onClickimpala版本 3.4.0

官网地址

https://impala.apache.org/docs/build/impala-3.4.pdficon-default.png?t=N2N8https://impala.apache.org/docs/build/impala-3.4.pdf 

The EXPLAIN output reminds you if table or column statistics are missing from any table involved in the query. These statistics are important for optimizing queries involving large tables or multi-table joins. See COMPUTE STATS Statement on page 226 for how to gather statistics, and Table and Column Statistics on page 636 for how to use this information for query tuning

翻译:  COMPUTE STATS 对于查询优化非常重要!!。

那就学习 compute stats

COMPUTE STATS [db_name.]table_name [ ( column_list ) ] [TABLESAMPLE SYSTEM(percentage) [REPEATABLE(seed)]] column_list ::= column_name [ , column_name, ... ]

COMPUTE INCREMENTAL STATS [db_name.]table_name [PARTITION (partition_spec)] partition_spec ::= simple_partition_spec | complex_partition_spec simple_partition_spec ::= partition_col=constant_value complex_partition_spec ::= comparison_expression_on_partition_col

这里多说点为什么这种统计信息有好处,因为提前知道了你有多少行,数据量大概有多少,,join的列最大值null值distinc等一系列信息,对于后面是broadcast还是 partition join 就省略了步骤

说下hive也可以分析,这些数据都是存储在mysql的。但是impala好像不能用hive分析的分区数据。

impala常见的用法

compute stats  odserpjdata_kd.gl_balances 

compute INCREMENTAL  stats  odserpjdata_kd.gl_balances 

drop stats xxxx

show table stats odserpjdata_kd.gl_balances 

show column stats odserpjdata_kd.gl_balances 

接着学习explain

Read the EXPLAIN plan from bottom to top:

• The last part of the plan shows the low-level details such as the expected amount of data that will be read, where you can judge the effectiveness of your partitioning strategy and estimate how long it will take to scan a table based on total data size and the size of the cluster.

• As you work your way up, next you see the operations that will be parallelized and performed on each Impala node.

• At the higher levels, you see how data flows when intermediate result sets are combined and transmitted from one node to another.

• See EXPLAIN_LEVEL Query Option on page 346 for details about the EXPLAIN_LEVEL query option, which lets you customize how much detail to show in the EXPLAIN plan depending on whether you are doing high-level or low-level tuning, dealing with logical or physical aspects of the query 

我们explain的时候,要从下往上去看

例如下面的explain。可以看到00和02 04 属于同一级别 01和03属于同一级别

所以我们先看 01 扫描gcc表,然后03 把该表广播出去 然后看00扫描gb表,再看02 此时两个开始inner join 再看04 

注意文章里提到一个set 属性 就是explain level 我们显示的计划级别 默认是1 

Default: 1
Arguments:
The allowed range of numeric values for this option is 0 to 3:
 0 or MINIMAL: A barebones list, one line per operation. Primarily useful for checking the join order in very long
queries where the regular EXPLAIN output is too long to read easily.
 1 or STANDARD: The default level of detail, showing the logical way that work is split up for the distributed
query.
 2 or EXTENDED: Includes additional detail about how the query planner uses statistics in its decision-making
process, to understand how a query could be tuned by gathering statistics, using query hints, adding or removing
predicates, and so on. In Impala 3.2 and higher, the output also includes the analyzed query with the cast
information in the output header, and the implicit cast info in the Predicate section.
 3 or VERBOSE: The maximum level of detail, showing how work is split up within each node into “query
fragments” that are connected in a pipeline. This extra detail is primarily useful for low-level performance testing
and tuning within Impala itself, rather than for rewriting the SQL code at the user level

 个人试了下 1就够用了。2 3 很详细 也很复杂,不多说了。 

直接来实战吧。

select count(1) from odserpjdata_kd.gl_code_combinations -- 11005694  1000w

select count(1) from odserpjdata_kd.gl_balances --107250223  1E

 我们开始explain,两个表inner join 没啥好说的

explain
select * 
from odserpjdata_kd.gl_balances  gb ,
odserpjdata_kd.gl_code_combinations  gcc 
where gb.code_combination_id =gcc.code_combination_id 

-- 这两个表join count(1) =3297320

Max Per-Host Resource Reservation: Memory=35.00MB Threads=5
Per-Host Resource Estimates: Memory=2.18GB
WARNING: The following tables are missing relevant table and/or column statistics.
odserpjdata_kd.gl_balances, odserpjdata_kd.gl_code_combinations

PLAN-ROOT SINK
|
04:EXCHANGE [UNPARTITIONED]
|
02:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: gb.code_combination_id = gcc.code_combination_id
|  runtime filters: RF000 <- gcc.code_combination_id, RF001 <- gcc.code_combination_id
|  row-size=2.42KB cardinality=unavailable
|
|--03:EXCHANGE [BROADCAST]
|  |
|  01:SCAN KUDU [odserpjdata_kd.gl_code_combinations gcc]
|     row-size=1.80KB cardinality=unavailable
|
00:SCAN KUDU [odserpjdata_kd.gl_balances gb]
   runtime filters: RF000 -> gb.code_combination_id, RF001 -> gb.code_combination_id
   row-size=640B cardinality=unavailable

 

前面我也解释过了,先是两个scan kudu表,然后 其中一个broadcast广播到各个节点,最后合并数据,问题来了。我这个表一个1千万 一个一个E,join后的结果有3000w,我就想问哪个表是大表,哪个表是小表??这也能broadcast?

但是聪明的我已经发现这个warning了。就是没有统计表的信息

compute  stats  odserpjdata_kd.gl_balances 
compute  stats  odserpjdata_kd.gl_code_combinations

继续explain

Max Per-Host Resource Reservation: Memory=42.00MB Threads=6
Per-Host Resource Estimates: Memory=2.47GB

PLAN-ROOT SINK
|
05:EXCHANGE [UNPARTITIONED]
|
02:HASH JOIN [INNER JOIN, PARTITIONED]
|  hash predicates: gb.code_combination_id = gcc.code_combination_id
|  runtime filters: RF000 <- gcc.code_combination_id, RF001 <- gcc.code_combination_id
|  row-size=2.67KB cardinality=107.25M
|
|--04:EXCHANGE [HASH(gcc.code_combination_id)]
|  |
|  01:SCAN KUDU [odserpjdata_kd.gl_code_combinations gcc]
|     row-size=1.95KB cardinality=11.01M
|
03:EXCHANGE [HASH(gb.code_combination_id)]
|
00:SCAN KUDU [odserpjdata_kd.gl_balances gb]
   runtime filters: RF000 -> gb.code_combination_id, RF001 -> gb.code_combination_id
   row-size=745B cardinality=107.25M

突然发现就是分区join了。 

其中cardinality(基数) 也有值了就是我compute

而且多了一步exchange。

突然想到一般广播都有大小限制的,那么impala的限制是多少呢? 

BROADCAST_BYTES_LIMIT    34359738368    ADVANCED

没有看错 是32G!所以前面没有统计的时候能够广播。

怎么说呢?两钟都能查出数据,但是谁更好一点呢? 我到cm的控制页面看到如下。

broadcast(drop stats)=3.9 GiB   partition(compute stats)=674.8M,哪个好一目了然。

 

 又看到了一个参数 

DEFAULT_JOIN_DISTRIBUTION_MODE    BROADCAST    ADVANCED

默认就是广播

DEFAULT_JOIN_DISTRIBUTION_MODE Query Option

DEFAULT_JOIN_DISTRIBUTION_MODE Query Option

This option determines the join distribution that Impala uses when any of the tables involved in a join query is missing statistics.

Impala optimizes join queries based on the presence of table statistics, which are produced by the Impala COMPUTE STATS statement. By default, when a table involved in the join query does not have statistics, Impala uses the "broadcast" technique that transmits the entire contents of the table to all executor nodes participating in the query. If one table involved in a join has statistics and the other does not, the table without statistics is broadcast. If both tables are missing statistics, the table that is referenced second in the join order is broadcast. This behavior is appropriate when the table involved is relatively small, but can lead to excessive network, memory, and CPU overhead if the table being broadcast is large.

Because Impala queries frequently involve very large tables, and suboptimal joins for such tables could result in spilling or out-of-memory errors, the setting DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE lets you override the default behavior. The shuffle join mechanism divides the corresponding rows of each table involved in a join query using a hashing algorithm, and transmits subsets of the rows to other nodes for processing. Typically, this kind of join is more efficient for joins between large tables of similar size.

The setting DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE is recommended when setting up and deploying new clusters, because it is less likely to result in serious consequences such as spilling or out-of-memory errors if the query plan is based on incomplete information. This setting is not the default, to avoid changing the performance characteristics of join queries for clusters that are already tuned for their existing workloads.

 

翻译下。 

当连接查询中有个表缺少了统计信息的时候就采用的默认join方式(和我们上面的例子一样,最开始没有compute 结果就是默认的broadcast)

如果联接中涉及的一个表具有统计信息,而另一个没有,则广播不具有统计信息的表。

如果两个表都缺少统计信息,则广播联接顺序中第二个引用的表。当涉及的表相对较小时,这种行为是合适的,但如果正在广播的表较大,则可能导致过多的网络、内存和CPU开销。

select 1 from A,B where A.id=B.id 广播右边的B

select 1 from B,A where A.id=B.id 广播右边的A

select 1 from B,A where A.id=B.id + compute stats A 广播B(因为B没有统计信息)

一般来说我没见过其他人compute,所以查询的时候注意了大表放前面小表放后面。

因为impala一般表都比较大 动则千万上亿,所以很容易内存溢出,此处你就要注意了。 

DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE  这个就是我们之前说的分区join,这样耗费的内存小点。但是怎么说呢 如果是大表join小表我觉得还是broadcast好点。官方也这么建议。

总结下我们学习到了什么?

1. compute stats 语法

2.explain 用法和查看

3.broadcast和shuffle的区别,

4.join的内存溢出如何解决?(shuffle)

5.set 参数的设置BROADCAST_BYTES_LIMIT DEFAULT_JOIN_DISTRIBUTION_MODE

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值