GBase 8a 日志之 trc 日志详解

最新推荐文章于 2024-02-28 15:50:02 发布

Linux有意思吗

最新推荐文章于 2024-02-28 15:50:02 发布

阅读量666

点赞数

分类专栏： GBASE 文章标签：数据仓库数据库架构数据库开发 dba sql

本文链接：https://blog.csdn.net/weixin_34421618/article/details/123061939

版权

GBASE 专栏收录该内容

135 篇文章 6 订阅

订阅专栏

gcluster 日志分为三类： trc 日志记录查询计划； express.log 记录 SQL执行过程以及执行过程中的警告和错误； system.log 主要记录 gcluster 的启停以及 crash 信息，日志存放路径为： /opt/gcluster/log/gcluster。

当需要分析 SQL 的查询计划时请查看 trc 日志， trc 日志详细记录了 SQL的查询和执行计划，通过分析这些计划来确定 SQL 的计划是否是最优的。

在 gcluster 配置文件中设置
gbase_sql_trace = 1
gbase_sql_trace_level = 3

后，查询计划可输出到.trc 文件中。

日志内容

gbase> SELECT b,SUM(a) FROM t1 GROUP BY b HAVING(SUM(a)>100);
//按照 b 进行动态 hash 的查询计划如下。
==============================================
QueryPlan:
++++++++++++++++++++++++++++++++++++++++++++++
Step: 0
Leaf type: REGULAR STEP
Need combiner: false
Is Broadcast: false
Target temp table:
_tmp_rht_n1_t12_1_356508_0x45a00ad0_0x12eebd68
Temp table definition: CREATE TABLE
`gctmpdb`._tmp_rht_n1_t12_1_356508_0x45a00ad0_0x12eebd68(`_col_1`
VARCHAR(100), `_col_2` DECIMAL(18,0)) ENGINE = express nolock
Optimization: {hash redist}
Hash Redist Indexes: 1
Query String: SELECT `zsy1.t1`.`b` AS `_col_1`,
SUM(`zsy1.t1`.`a`) AS `_col_2` FROM `zsy1`.t1 AS `zsy1.t1` GROUP BY
`_col_1`
++++++++++++++++++++++++++++++++++++++++++++++
Step: 1
Leaf type: REGULAR STEP
Need combiner: false
Is Broadcast: false
Target temp table: _tmp_n1_t12_2_356618_0x45a00b90_0x12eebd68
Temp table definition: CREATE TABLE
`gctmpdb`._tmp_n1_t12_2_356618_0x45a00b90_0x12eebd68(`b`
VARCHAR(100), `sum(a)` DECIMAL(18,0)) ENGINE = express nolock
Optimization:
Query String: SELECT _col_1 AS `b`, SUM(_col_2) AS `sum(a)`
FROM `gctmpdb`._tmp_rht_n1_t12_1_356508_0x45a00ad0_0x12eebd68 GROUP
BY `b` HAVING (SUM(_col_2) > 100)
==============================================
2012-07-18 19:06:09.356 [M: 0B, 0B,D: 0B] [DC: 0,
0]
CExecStep
--------------------------
isFinalStep = 0
m_nRequestId = 0
StepNo = 0
DestType = 1
aStepDetail
---------------
isProducer = 1
isConsumer = 1
isGroupHashRedist = 0
isAllTableAreHashTmpDist = 0
queryString = SELECT `zsy1.t1`.`b` AS `_col_1`, SUM(`zsy1.t1`.`a`)
AS `_col_2` FROM `zsy1`.t1 AS `zsy1.t1` GROUP BY `_col_1`
targetTable =
`gctmpdb`._tmp_rht_n1_t12_1_356508_0x45a00ad0_0x12eebd68
targetSchema = CREATE TABLE
`gctmpdb`._tmp_rht_n1_t12_1_356508_0x45a00ad0_0x12eebd68(`_col_1`
VARCHAR(100), `_col_2` DECIMAL(18,0)) ENGINE = express nolock
DropList =
CExecStep
--------------------------
isFinalStep = 0
m_nRequestId = 0
StepNo = 1
DestType = 0
aStepDetail
---------------
isProducer = 1
isConsumer = 0
isGroupHashRedist = 0
isAllTableAreHashTmpDist = 1
queryString = SELECT _col_1 AS `b`, SUM(_col_2) AS `sum(a)` FROM
`gctmpdb`._tmp_rht_n1_t12_1_356508_0x45a00ad0_0x12eebd68 GROUP BY `b`
HAVING (SUM(_col_2) > 100)
DropList =
2012-07-18 19:06:09.419 [M: 0B, 0B,D: 0B] [DC: 0, 0]
Executor success.

日志项目

( 一）中间表的命名方式：

_tmp_n1_t11_1_405747_0x45a002a0_0x12eebd68

_tmp_rht_n1_t12_1_356508_0x45a00ad0_0x12eebd68

对于一个集群临时表名， _tmp_rht 前缀表示这是一个动态重分布临时表，仅有_tmp 前缀时，表示这是一个普通临时表（复制表）。其余部分，动态重分布临时表和普通临时表一样，因此这里以动态重分布产生的临时表名_tmp_rht_n1_t12_1_356508_0x45a00ad0_0x12eebd68 来统一说明。

_tmp_rht:动态重分布临时表
n1:发起这条查询的节点后缀，这里表示此查询是从 n1 节点发起的
t12: 查询计数， gcluster server 启动后，执行的第 n 条查询， n 从 1 开始。
1：该查询产生的第 m 个临时表名， m 从 1 开始（因为查询会分解为多步执行，因此可能会产生多个临时表）
356508: 产生临时表名时，系统时间的微秒部分
0x45a00ad0：产生临时表名时，时间结构的内存地址
0x12eebd68：产生临时表名的 thd 对象的内存地址

（二） Optimization
A． {single hash node}
Select * from t where hash_col = '123';
B． {hash group by}
Select * from t1 group by a, hash_col, c;

C． {distinct group hash}
select count(distinct hash_col) from x1;
select sum(distinct hash_col) from x1 group by id3;
D． {distinct hash}
select distinct a, hash_col, c from x1;
select distinct id2, hash_col, id3 from x1 group by id3
E． {one step hash join complex}

在多步计划中，其中一步可以使用 hash join complex 优化（主查询与相关子查询，或非相关子查询存在 hash 关系）
测试用例如下：（id4 是 hash 列）

gbase> SELECT x3.* FROM x1, x3 WHERE x1.id3 = x3.id3 AND x1.id4 IN (SELECT id4
FROM x2);
==============================================
2012-07-19 18:31:14.243
JOIN ----test.x3
|
test.x1
2012-07-19 18:31:14.243
==============================================
2012-07-19 18:31:14.244
JOIN {(`test.x1`.`id3` =
`test.x3`.`id3`)}---------------------------------------------------
--test.x3{}
|
test.x1{`test.x1`.`id4` IN ( SELECT `test.x2`.`id4` as `id4` From
`test`.`x2` as `test.x2`)}
2012-07-19 18:31:14.244 >>>>> using one step hash join complex
optimization
2012-07-19 18:31:14.244
==============================================
==============================================
QueryPlan:
++++++++++++++++++++++++++++++++++++++++++++++
Step: 0
Leaf type: REGULAR STEP
Need combiner: false
Is Broadcast: false
Target temp table: _tmp_n1_t15_1_244477_0x45cfca40_0x13c53788
Temp table definition: CREATE TABLE
`gctmpdb`._tmp_n1_t15_1_244477_0x45cfca40_0x13c53788(`id3` INT)
ENGINE = express nolock
Optimization: {one step hash join complex}
Query String: SELECT `test.x1`.`id3` AS `id3` FROM
`test`.x1 AS `test.x1` WHERE `test.x1`.`id4` IN ( SELECT `test.x2`.`id4`
as `id4` From `test`.`x2` as `test.x2`)
++++++++++++++++++++++++++++++++++++++++++++++
Step: 1
Leaf type: REGULAR STEP
Need combiner: false
Is Broadcast: false
Target temp table: _tmp_n1_t15_2_244595_0x45cfcaf0_0x13c53788
Temp table definition: CREATE TABLE
`gctmpdb`._tmp_n1_t15_2_244595_0x45cfcaf0_0x13c53788(`entry_id` INT,
`id2` INT, `id3` INT, `id4` INT) ENGINE = express nolock
Optimization:
Query String: SELECT `test.x3`.`entry_id` AS `entry_id`,
`test.x3`.`id2` AS `id2`, `test.x3`.`id3` AS `id3`, `test.x3`.`id4` AS
`id4` FROM `gctmpdb`._tmp_n1_t15_1_244477_0x45cfca40_0x13c53788 INNER
JOIN `test`.x3 AS `test.x3` ON
(`_tmp_n1_t15_1_244477_0x45cfca40_0x13c53788`.`id3` =
`test.x3`.`id3`)
==============================================
2012-07-19 18:31:14.244
CExecStep
--------------------------
isFinalStep = 0
m_nRequestId = 0
StepNo = 0
DestType = 1
aStepDetail
---------------
isProducer = 1
isConsumer = 1
isGroupHashRedist = 0
isAllTableAreHashTmpDist = 0
queryString = SELECT `test.x1`.`id3` AS `id3` FROM `test`.x1 AS
`test.x1` WHERE `test.x1`.`id4` IN ( SELECT `test.x2`.`id4` as `id4`
From `test`.`x2` as `test.x2`)
targetTable =
`gctmpdb`._tmp_n1_t15_1_244477_0x45cfca40_0x13c53788
targetSchema = CREATE TABLE
`gctmpdb`._tmp_n1_t15_1_244477_0x45cfca40_0x13c53788(`id3` INT)
ENGINE = express nolock
DropList =
CExecStep
--------------------------
isFinalStep = 0
m_nRequestId = 0
StepNo = 1
DestType = 0
aStepDetail
---------------
isProducer = 1
isConsumer = 0
isGroupHashRedist = 0
isAllTableAreHashTmpDist = 0
queryString = SELECT `test.x3`.`entry_id` AS `entry_id`,
`test.x3`.`id2` AS `id2`, `test.x3`.`id3` AS `id3`, `test.x3`.`id4` AS
`id4` FROM `gctmpdb`._tmp_n1_t15_1_244477_0x45cfca40_0x13c53788 INNER
JOIN `test`.x3 AS `test.x3` ON
(`_tmp_n1_t15_1_244477_0x45cfca40_0x13c53788`.`id3` =
`test.x3`.`id3`)
DropList =
2012-07-19 18:31:15.171 Executor success.
F． {hash redist}
Hash Redist Indexes: 1

hash 重分布步骤， Hash Redist Indexes 表示按 projection 的第 n 列进行
hash 重分布（n 从 1 开始）
例如： select 执行结果按照 projection 的第 1 列（`zsy1.t1`.`b` AS`_col_1`）进行动态 hash 后，把结果发送到各个节点，日志如下：

Optimization: {hash redist}
Hash Redist Indexes: 1
Query String: SELECT `zsy1.t1`.`b` AS `_col_1`,
SUM(`zsy1.t1`.`a`) AS `_col_2`
FROM `zsy1`.t1 AS `zsy1.t1` GROUP BY `_col_1`

(三） DestType（中间结果的汇聚方式）
DEST_TYPE_NOSET = 0, ///< not set

DEST_TYPE_BROADCAST = 1,///< 广播到所有的 Node : 8a -> 8a 或者是 Cluster -> 8a
DEST_TYPE_COORD = 2, ///< 发送结果到 Cluster
DEST_TYPE_ONE = 3, ///< 发送结果到一个 Node 目前为当前Cluster 所在的 8a

（四） isProducer
生产者
（五） isConsumer
消费者
（六） JOIN 树
Inner join 评估把较小的表放在左面，较大的表放到右面，如果要拉复制表，则把左面的表拉成复制表, 如果是 left join 则不能改变顺序，只能把右边的表拉成复制表。