定位postgresql中疯狂执行DML的表&原理学习

Hehuyi_In

已于 2024-04-03 11:22:28 修改

阅读量1.9k

点赞数 2

分类专栏： PostgreSQL 源码学习统计信息文章标签： postgresql DML 变化量统计信息源码学习

于 2023-08-15 21:41:51 首次发布

本文链接：https://blog.csdn.net/Hehuyi_In/article/details/117171468

版权

PostgreSQL 同时被 3 个专栏收录

140 篇文章 113 订阅

订阅专栏

源码学习

67 篇文章 52 订阅

订阅专栏

统计信息

12 篇文章 0 订阅

订阅专栏

当数据库中WAL生成量过大，或者某段时间内WAL突然增长，原因之一很可能就是有了大量执行的DML语句。本文参考如何找到postgres中疯狂增长的wal日志的语句 – OracleBlog 中的方法定位一段时间内DML量最大的表，并稍微改动监控每天表变化量。另外，也通过源码学习学习相关字段究竟是从何获取的。

一、定位每天DML量最大的表

其核心是pg_stat_all_tables视图的n_tup_ins,n_tup_upd,n_tup_del，这些数据都是累计值，所以要在不同时间点获取数据快照并取delta值。定时任务设置是一天一取，时间可以自行调整。

vi get_table_dml_cnt.sh

#!/bin/bash

TODAY=`date  +"%Y%m%d"`
YTD=`date -d '-1 day' '+%Y%m%d'`

psql << EOF
create table table_dml_$TODAY as select date_trunc('second',now()) as sample_time,schemaname,relname,n_tup_ins,n_tup_upd,n_tup_del,n_tup_hot_upd from pg_catalog.pg_stat_all_tables;

insert into delta_dml_table_new
select t2.sample_time,t2.schemaname, (t2.n_tup_ins-t1.n_tup_ins) as delta_ins, (t2.n_tup_upd-t1.n_tup_upd) as delta_upd,(t2.n_tup_del-t1.n_tup_del) as delta_del,(t2.n_tup_ins+t2.n_tup_upd+t2.n_tup_del-t1.n_tup_ins-t1.n_tup_upd-t1.n_tup_del) as del_dml,(EXTRACT (EPOCH FROM  t2.sample_time::timestamp )::float-EXTRACT (EPOCH FROM  t1.sample_time::timestamp )::float) as delta_second,round(cast((t2.n_tup_ins+t2.n_tup_upd+t2.n_tup_del-t1.n_tup_ins-t1.n_tup_upd-t1.n_tup_del)/(EXTRACT (EPOCH FROM  t2.sample_time::timestamp )::float-EXTRACT (EPOCH FROM  t1.sample_time::timestamp )::float)as numeric),2) as delta_dml_per_sec
from table_dml_$YTD t1, table_dml_$TODAY t2
where t2.schemaname=t1.schemaname and t2.relname=t1.relname;
EOF

运行后可以根据各个子表自行查询

也可以查询汇总后的delta_dml_table_new

select * from delta_dml_table_new where sample_time>='2023-07-15' and sample_time<'2023-07-16' order by del_dml desc limit 5;

按照schema统计

select schemaname,sum(del_dml) from delta_dml_table_new where sample_time>='2023-07-15' and sample_time<'2023-07-16' group by schemaname order by 2 desc;

查询时去除unlog表

unlog表的特性使得即使dml量大，wal日志量也很小，可以忽略

select a.* from delta_dml_table_new a,pg_class b where a.relname=b.relname and b.relpersistence<>'u' and b.relkind = 'r' and b.relnamespace::regnamespace::text=schemaname::text and sample_time>='2024-04-01' and sample_time<'2024-04-03' order by del_dml desc;

二、源码学习

首先这些字段都来自pg_stat_all_tables视图，n_tup_ins,n_tup_upd,n_tup_del其实都比较类似，下面以n_tup_ins为例，看看这些字段值究竟从何而来。

1. pg_stat_all_tables视图

\d+ pg_stat_all_tables查看视图定义，可以看到，n_tup_ins来自pg_stat_get_tuples_inserted

\df+ pg_stat_get_tuples_inserted查看函数定义，发现是个内部函数，且与原函数同名，说明它对应源码中的同名函数。

2. pg_stat_get_tuples_inserted函数

这个函数在pgstatfuncs.c文件中，其中稍微值得看的就是tabentry->tuples_inserted，vscode中点击它会跳到tabentry的结构体定义。

Datum
pg_stat_get_tuples_inserted(PG_FUNCTION_ARGS)
{
    Oid         relid = PG_GETARG_OID(0);
    int64       result;
    PgStat_StatTabEntry *tabentry;

    if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
        result = 0;
    else
        result = (int64) (tabentry->tuples_inserted);

    PG_RETURN_INT64(result);
}

3. PgStat_StatTabEntry结构体

这个结构体存储每个表的统计信息数据，位于pgstat.h文件中。

/* ----------
 * PgStat_StatTabEntry          The collector's data per table (or index)
 * ----------
 */
typedef struct PgStat_StatTabEntry
{
    Oid         tableid;

    PgStat_Counter numscans;

    PgStat_Counter tuples_returned;
    PgStat_Counter tuples_fetched;

    PgStat_Counter tuples_inserted;
    PgStat_Counter tuples_updated;
    PgStat_Counter tuples_deleted;
    PgStat_Counter tuples_hot_updated;

    PgStat_Counter n_live_tuples;
    PgStat_Counter n_dead_tuples;
    PgStat_Counter changes_since_analyze;
    PgStat_Counter unused_counter;  /* kept for ABI compatibility */
    PgStat_Counter inserts_since_vacuum;

    PgStat_Counter blocks_fetched;
    PgStat_Counter blocks_hit;

    TimestampTz vacuum_timestamp;   /* user initiated vacuum */
    PgStat_Counter vacuum_count;
    TimestampTz autovac_vacuum_timestamp;   /* autovacuum initiated */
    PgStat_Counter autovac_vacuum_count;
    TimestampTz analyze_timestamp;  /* user initiated */
    PgStat_Counter analyze_count;
    TimestampTz autovac_analyze_timestamp;  /* autovacuum initiated */
    PgStat_Counter autovac_analyze_count;
} PgStat_StatTabEntry;

但tabentry->tuples_inserted的值是怎么来的呢？统计信息数据收集由pgstats进程进行（pg 15版本前），其入口函数为PgstatCollectorMain，会调用pgstat_recv_tabstat函数。事实上在源码中直接搜索 tabentry->tuples_inserted = ，出来的也是pgstat_recv_tabstat函数。

4. pgstat_recv_tabstat函数

可以看到对于新表，直接赋值为tabmsg->t_counts.t_tuples_inserted；如果是已存在的表，则为原值累加tabmsg->t_counts.t_tuples_inserted。

tabmsg->t_counts.t_tuples_inserted又是个什么东东？点一点t_tuples_inserted发现跳到了PgStat_TableCounts结构体。根据注释介绍，这是各进程真正为每个表保存的计数器。

/* ----------
 * PgStat_TableCounts           The actual per-table counts kept by a backend
 */
typedef struct PgStat_TableCounts
{
…
    PgStat_Counter t_tuples_inserted;
    PgStat_Counter t_tuples_updated;
    PgStat_Counter t_tuples_deleted;
…
} PgStat_TableCounts;

PgStat_Counter其实就是个整型

/* ----------
 * The data type used for counters.
 * ----------
 */
typedef int64 PgStat_Counter;

再点一点t_counts

/* ----------
 * PgStat_TableEntry            Per-table info in a MsgTabstat
 * ----------
 */
typedef struct PgStat_TableEntry
{
    Oid         t_id;
    PgStat_TableCounts t_counts;
} PgStat_TableEntry;

没发现什么有用的，后来根据网上搜索，tabmsg->t_counts.* 是通过如下函数定义的

extern void pgstat_count_heap_insert(Relation rel, PgStat_Counter n);
extern void pgstat_count_heap_update(Relation rel, bool hot);
extern void pgstat_count_heap_delete(Relation rel);
extern void pgstat_count_truncate(Relation rel);
extern void pgstat_update_heap_dead_tuples(Relation rel, int delta);

5. pgstat_count_heap_insert函数

这里可以看到它来自trans->tuples_inserted，并且是累加计算的。

/*
 * pgstat_count_heap_insert - count a tuple insertion of n tuples
 */
void
pgstat_count_heap_insert(Relation rel, PgStat_Counter n)
{
    PgStat_TableStatus *pgstat_info = rel->pgstat_info;

    if (pgstat_info != NULL)
    {
        /* We have to log the effect at the proper transactional level */
        int         nest_level = GetCurrentTransactionNestLevel();

        if (pgstat_info->trans == NULL ||
            pgstat_info->trans->nest_level != nest_level)
            add_tabstat_xact_level(pgstat_info, nest_level);

        pgstat_info->trans->tuples_inserted += n;
    }
}

函数调用栈如下

Breakpoint 1, pgstat_count_heap_insert (rel=rel@entry=0x7f72ddb38d50, n=n@entry=1) at pgstat.c:2221
2221 {
(gdb) bt
#0 pgstat_count_heap_insert (rel=rel@entry=0x7f72ddb38d50, n=n@entry=1) at pgstat.c:2221
#1 0x00000000004c69eb in heap_insert (relation=relation@entry=0x7f72ddb38d50, tup=tup@entry=0x1255f48,
cid=cid@entry=0, options=options@entry=0, bistate=bistate@entry=0x0) at heapam.c:2228
#2 0x00000000004d23e8 in heapam_tuple_insert (relation=0x7f72ddb38d50, slot=0x1255eb8, cid=0, options=0, bistate=0x0)
at heapam_handler.c:252
#3 0x000000000065a56c in table_tuple_insert (bistate=0x0, options=0, cid=<optimized out>, slot=0x1255eb8,
rel=0x7f72ddb38d50) at ../../../src/include/access/tableam.h:1374
#4 ExecInsert (mtstate=mtstate@entry=0x1254af8, resultRelInfo=resultRelInfo@entry=0x1254d08, slot=0x1255eb8,
planSlot=planSlot@entry=0x12552a8, estate=estate@entry=0x1254898, canSetTag=<optimized out>)
at nodeModifyTable.c:934
#5 0x000000000065b98b in ExecModifyTable (pstate=0x1254af8) at nodeModifyTable.c:2561
#6 0x0000000000630462 in ExecProcNode (node=0x1254af8) at ../../../src/include/executor/executor.h:257
#7 ExecutePlan (execute_once=<optimized out>, dest=0x1122670, direction=<optimized out>, numberTuples=0,
sendTuples=false, operation=CMD_INSERT, use_parallel_mode=<optimized out>, planstate=0x1254af8, estate=0x1254898)
at execMain.c:1551
#8 standard_ExecutorRun (queryDesc=0x12538c8, direction=<optimized out>, count=0, execute_once=<optimized out>)
at execMain.c:361
#9 0x00007f72d44074a5 in pgss_ExecutorRun (queryDesc=0x12538c8, direction=ForwardScanDirection, count=0,
execute_once=<optimized out>) at pg_stat_statements.c:1003
#10 0x00007f72d420165e in explain_ExecutorRun (queryDesc=0x12538c8, direction=ForwardScanDirection, count=0,
execute_once=<optimized out>) at auto_explain.c:334
#11 0x0000000000797fea in ProcessQuery (plan=<optimized out>, sourceText=0x10f7918 "insert into t values(6);",
params=0x0, queryEnv=0x0, dest=0x1122670, qc=0x7ffd480d2a70) at pquery.c:160
#12 0x0000000000798a05 in PortalRunMulti (portal=portal@entry=0x116ef68, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x1122670, altdest=altdest@entry=0x1122670,
qc=qc@entry=0x7ffd480d2a70) at pquery.c:1266
#13 0x0000000000798d3e in PortalRun (portal=portal@entry=0x116ef68, count=count@entry=9223372036854775807,
isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x1122670,
altdest=altdest@entry=0x1122670, qc=qc@entry=0x7ffd480d2a70) at pquery.c:786
#14 0x0000000000794c8e in exec_simple_query (query_string=0x10f7918 "insert into t values(6);") at postgres.c:1214
#15 0x0000000000796103 in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7ffd480d2f20, dbname=0x1127248 "postgres",
username=<optimized out>) at postgres.c:4486
#16 0x0000000000487b8a in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4506
#17 BackendStartup (port=0x111eb20) at postmaster.c:4228
#18 ServerLoop () at postmaster.c:1745

其中结构体定义如下

/* ----------
 * PgStat_TableXactStatus       Per-table, per-subtransaction status
 * ----------
 */
typedef struct PgStat_TableXactStatus
{
    PgStat_Counter tuples_inserted; /* tuples inserted in (sub)xact */
    PgStat_Counter tuples_updated;  /* tuples updated in (sub)xact */
    PgStat_Counter tuples_deleted;  /* tuples deleted in (sub)xact */
    bool        truncated;      /* relation truncated in this (sub)xact */
    PgStat_Counter inserted_pre_trunc;  /* tuples inserted prior to truncate */
    PgStat_Counter updated_pre_trunc;   /* tuples updated prior to truncate */
    PgStat_Counter deleted_pre_trunc;   /* tuples deleted prior to truncate */
    int         nest_level;     /* subtransaction nest level */
    /* links to other structs for same relation: */
    struct PgStat_TableXactStatus *upper;   /* next higher subxact if any */
    PgStat_TableStatus *parent; /* per-table status */
    /* structs of same subxact level are linked here: */
    struct PgStat_TableXactStatus *next;    /* next of same subxact */
} PgStat_TableXactStatus;

/* ----------
 * PgStat_TableStatus           Per-table status within a backend
 *
 * Many of the event counters are nontransactional, ie, we count events
 * in committed and aborted transactions alike.  For these, we just count
 * directly in the PgStat_TableStatus.  However, delta_live_tuples,
 * delta_dead_tuples, and changed_tuples must be derived from event counts
 * with awareness of whether the transaction or subtransaction committed or
 * aborted.  Hence, we also keep a stack of per-(sub)transaction status
 * records for every table modified in the current transaction.  At commit
 * or abort, we propagate tuples_inserted/updated/deleted up to the
 * parent subtransaction level, or out to the parent PgStat_TableStatus,
 * as appropriate.
 * ----------
 */
typedef struct PgStat_TableStatus
{
    Oid         t_id;           /* table's OID */
    bool        t_shared;       /* is it a shared catalog? */
    struct PgStat_TableXactStatus *trans;   /* lowest subxact's counts */
    PgStat_TableCounts t_counts;    /* event counts to be sent */
} PgStat_TableStatus;

PgStat_TableStatus结构体用于维护表的统计信息。在这些统计信息中，许多事件计数器是非事务性的，即无论是在已提交的事务还是已中止的事务中都计算事件数量。对于这些计数器，我们直接在PgStat_TableStatus中进行计数。但是，delta_live_tuples、delta_dead_tuples和changed_tuples必须从具有事务或子事务已提交或已中止标识的事件计数中派生出来。因此，我们还为当前事务中修改的每个表保留了一个堆栈，用于保存每个（子）事务的状态记录。在事务提交或中止时，我们将tuples_inserted/updated/deleted传播到父子事务级别，或者根据情况传播到父级PgStat_TableStatus中。

参考：

https://github.com/digoal/blog/blob/master/201610/20161018_03.md

PostgreSQL数据库WAL——RM_HEAP_ID日志记录动作_mb62de8abf75c00的技术博客_51CTO博客

PostgreSQL 源码解读（1）- 插入数据#1 - 简书

PostgreSQL 源码解读（4）- 插入数据#3（heap_insert） - 简书