标签
PostgreSQL , Greenplum , explain analyze , 解读 , 深度明细 , explain_memory_verbosity , gp_enable_explain_allstat
背景
解读Greenplum 的explain analyze输出。同时介绍两个深度明细开关explain_memory_verbosity , gp_enable_explain_allstat 。
例子
如果不打开详细开关,那么统计信息中,rows out后面的明细,实际上输出的是返回行数最多的SEGMENT,并不一定是最慢的segment(请务必注意)。因此我们看到上面的offset甚至有一些比下面的offset更小的,原因就在此。
解释请参考如下:
其中,start offset 表示从query dispatcher(QD)发起,到返回这个NODE的第一条有效tuple的时间。
可以说明几个问题:
1、query dispatch下发执行计划到segment,启动segment 的worker process。这之间有一定的时间差,体现在start offset层面。
开始时间是qd下发分布式执行计划,如果有一些offset比较大,可能是worker process fork较慢引起。
2、结束时间是获取到当前NODE的第一条有效记录是结束点,如果有一些offset比较大,也可能是当前worker处理较慢,获得第一条有效记录时间较久。
3、又或者,数据较倾斜,得到第一条有效记录,需要过滤很多无效记录。
postgres=# explain analyze select count(*),c1 from a group by c1 limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1561155.53..1561155.57 rows=1 width=12)
Rows out: 1 rows with 0.003 ms to first row, 0.031 ms to end, start offset by 522 ms.
-> Gather Motion 48:1 (slice2; segments: 48) (cost=1561155.53..1561155.57 rows=1 width=12)
Rows out: 1 rows at destination with 3994 ms to end, start offset by 522 ms.
-> Limit (cost=1561155.53..1561155.55 rows=1 width=12)
Rows out: Avg 1.0 rows x 41 workers. Max 1 rows (seg0) with 0.006 ms to end, start offset by 538 ms.
-> HashAggregate (cost=1561155.53..1561156.80 rows=3 width=12)
Group By: a.c1
Rows out: Avg 1.0 rows x 41 workers. Max 1 rows (seg0) with 0.003 ms to first row, 1252 ms to end, start offset by 538 ms.
-> Redistribute Motion 48:48 (slice1; segments: 48) (cost=1561152.00..1561154.02 rows=3 width=12)
Hash Key: a.c1
-- 平均返回多少行,返回最多行的节点,Rows out: Avg 118.2 rows x 41 workers at destination. Max 192 rows
-- 从这个worker第一次Entry to this plan node时间到返回这个NODE的第一条有效tuple,with xxx ms to first row
-- 从这个worker第一次Entry to this plan node时间到返回这个NODE的最后一条有效tuple,(seg9) with 2669 ms to end
-- 从query dispatcher(QD)发起,到返回这个NODE的第一条有效tuple,start offset by 595 ms.
Rows out: Avg 118.2 rows x 41 workers at destination. Max 192 rows (seg9) with 2669 ms to end, start offset by 595 ms.
-> HashAggregate (cost=1561152.00..1561152.00 rows=3 width=12)
Group By: a.c1
-- appendStringInfo(str,
-- "Rows in: Avg %.1f rows x %d workers."
-- " Max %.0f rows%s",
-- ntuples_avg,
-- ns->ntuples.vcnt,
-- ns->ntuples.vmax,
-- segbuf);
-- Rows out: {noRowRequested} rows{ns->ntuples.vmax}
-- with {Time from this worker's first InstrStartNode() to its first result row} to first row
-- {Time from this worker's first InstrStartNode() to end of its results} to end
-- start offset by {Time from start of query on qDisp to this worker's first result row}
Rows out: Avg 101.0 rows x 48 workers. Max 101 rows (seg0) with 0.005 ms to first row, 2706 ms to end, start offset by 572 ms.
-> Append-only Columnar Scan on a (cost=0.00..1061152.00 rows=2083334 width=4)
-- 不同的planstate->type,输出的内容可能不一样,参考 cdbexplain_showExecStats
-- Rows out: {noRowRequested} rows{ns->ntuples.vmax}
-- with {Time from this worker's first InstrStartNode() to end of its results} to end
-- start offset by {Time from start of query on qDisp to this worker's first result row}
Rows out: 0 rows (seg0) with 34 ms to end, start offset by 568 ms.
-- 每一个分片的内存使用情况, 一个分片指在整个执行计划中,在segment内可以独立运行的连续的步骤集,后面有术语介绍
Slice statistics:
(slice0) Executor memory: 327K bytes.
(slice1) Executor memory: 828K bytes avg x 48 workers, 828K bytes max (seg0).
(slice2) Executor memory: 353K bytes avg x 48 workers, 356K bytes max (seg0).
-- 当gp_resqueue_memory_policy != RESQUEUE_MEMORY_POLICY_NONE时,打印内存统计信息
Statement statistics:
-- /* What is the memory reserved for this query's execution? */
-- uint64 query_mem;
-- PlannedStmt @ pgsql/src/include/nodes/plannodes.h
Memory used: 128000K bytes
-- 设置了哪些影响执行计划的非默认参数
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
-- 总执行时长
Total runtime: 4518.864 ms
(26 rows)
以上解释,详见代码 cdbexplain_showExecStats