WalMiner4.0日志挖掘工具的使用

WalMiner日志挖掘工具的使用

我们在日常运维PostgreSQL中经常会碰到开发人员误删数据,或者某个时间段突然产生大量WAL日志。那这些问题该如何快速高效的解决那?现在我们来分享一个wal日志挖掘工具。

walminer工具架构:
在这里插入图片描述

一 、WalMiner日志挖掘工具的介绍、部署安装

1.1、介绍

1、walminer是PostgreSQL的wal日志解析工具,4.0版本摒弃插件模式改为bin模式,现已脱离对目标数据库的编译依赖和安装依赖,一个walminer工具可以解析PG10~PG16的WAL日志。 现已实现的功能为wal2sql、fosync、pgto、waldump

2、walminer4.0改变使用机制,无需在目标数据库做任何操作,即可完成wal解析。另外walminer工具的编译安装不再依赖任何数据库版本,因此一份walminer工具可以支持多个PG版本的解析。(walminer4.0版本需要问作者购买license。网址:https://gitee.com/movead/XLogMiner)

3、walminer实现的功能:

  • 解析出执行的SQL语句,并能生成对应的undo SQL语句。与传统的logical decode插件相比,walminer不要求logical日志级别且解析方式较为灵活。
  • 当数据库被执行了TRUNCATE等不被wal记录的数据清除操作,或者发生磁盘页损坏,可以使用此功能从wal日志中搜索数据,以期尽量挽回数据。
  • wal2sql为walminer的基本功能,可以将wal日志解析为SQL语句,以及其undo语句,可解析部分DDL语句。
  • fosync为在wal2sql的基础上实现的,PG故障转移延迟数据同步功能。 在PG主备流复制中,如果主库发生故障,备库提升为新的主库,在异步主备的情况下,可能会有一些事务停留在了主库,没有同步到备库。这样会导致一些数据丢失。

1.2、环境准备

主机名IP地址操作系统数据库版本WalMiner版本
postgres192.168.6.107Red Hat Enterprise Linux Server release 7.7 (Maipo)PostgreSQL 16.3walminer4.7.0

查看Linux版本和防火墙状态

##查看发行版本
[root@postgres local]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.7 (Maipo)
[root@postgres local]# 

##查看防火墙状态
[root@postgres local]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)
[root@postgres local]# getenforce
Disabled
[root@postgres local]# 

1.3、walminer部署安装

##下载压缩包
$ wget https://gitee.com/movead/XLogMiner/releases/download/walminer_dev_4.7.0_20240707/walminer_x86_64_v4.7.0.tar.gz
##解压
$ tar -zxvf walminer_x86_64_v4.7.0.tar.gz
$ mv walminer_x86_64_v4.7.0.tar.gz walminer
$ cd walminer

##如果没有这个文件,只能本地测试,不能远程访问
$ cp walminer.license_greatfinish /usr/local/walminer/walminer.license

$ pwd
/home/fbase/walminer
$ ls
bin  lib  readme  share

## 设置环境变量
[postgres@pg ~]$ vi .bash_profile
export PATH=$PATH:/tmp/soft/walminer_x86_64_v4.4.2/bin/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/tmp/soft/walminer_x86_64_v4.4.2/lib/
[postgres@pg ~]$ source .bash_profile

#创建walminer运行目录,没有这个目录它将不能运行
[root@postgres local]# pwd
/usr/local
[root@postgres local]# mkdir walminer
[root@postgres local]# chown fbase:fbase

## 测试安装情况(执行walminer help,可以正常打印 help 信息,则证明安装部署成功)
[postgres@pg ~]$ walminer help
walminer [command] [options]
COMMANDS
---------
#wal2sql
  options
    -D dic file for miner
    -a out detail info for catalog change
    -w wal file path to miner
    -t dest of miner result(1 stdout, 2 file, 3 db)(stdout default)
    -k boundary kind(1 all, 2 lsn, 3 time, 4 xid)(all default)
    -m miner mode(0 nomal miner, 1 accurate miner)(nomal default) if k=2
    -r the relname for single table miner 
    -s start location if k=2 or k=3, or xid if k = 4 
          if k=2 default the min lsn of input wals   
          if k=3 or k=4 you need input this
    -e end wal location if k=2 or k=3
          if k=2 default the max lsn of input wals   
          if k=3 you need input this
    -f file to store miner result if t = 2
    -d target database name if t=3(default postgres)
    -h target database host if t=3(default localhost)
    -p target database port if t=3(default 5432)
    -u target database user if t=3(default postgres)
    -W target user password if t=3

二、案例演示:演示常用的wal2sql功能

2.1、模拟开发人员误操作数据,使用undo语句重新恢复过来

直接使用管理用户fbase创建数据库mytestdb,创建表t并插入记录,修改和删除记录

$ psql
psql (16.3)
Type "help" for help.

postgres=# 

postgres=# create database mytestdb;
CREATE DATABASE
postgres=# \c mytestdb 
You are now connected to database "mytestdb" as user "postgres".
mytestdb=# 
mytestdb=# create table t (id serial primary key, name varchar(20));
CREATE TABLE

--创建生成随机字符的函数
mytestdb=# CREATE OR REPLACE FUNCTION random_string(
  num INTEGER,
  chars TEXT default '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
) RETURNS TEXT
LANGUAGE plpgsql
AS $$
DECLARE
  res_str TEXT := '';
BEGIN
  IF num < 1 THEN
      RAISE EXCEPTION 'Invalid length';
  END IF;
  FOR __ IN 1..num LOOP
    res_str := res_str || substr(chars, floor(random() * length(chars))::int + 1, 1);
  END LOOP;
  RETURN res_str;
END $$;
CREATE FUNCTION

-- 插入数据
mytestdb=# insert into t(name) select random_string(8);
INSERT 0 1
mytestdb=# insert into t(name) select random_string(8);
INSERT 0 1
mytestdb=# insert into t(name) select random_string(8);
INSERT 0 1
mytestdb=# insert into t(name) select random_string(8);
INSERT 0 1
mytestdb=# insert into t(name) select random_string(8);
INSERT 0 1
mytestdb=# insert into t(name) select random_string(8);
INSERT 0 1
mytestdb=# insert into t(name) select random_string(8);
INSERT 0 1

mytestdb=# select * from t;
 id |   name
----+----------
  1 | YDyVJpdA
  2 | K1jdRRsB
  3 | sY6ee2dr
  4 | xiaW8MKa
  5 | JyI9deaj
  6 | 9CViaPkE
  7 | NFIsn97n
  8 | yTFpHkEu
(8 rows)

mytestdb=# update t set name='xiugai' where id=5;
UPDATE 1
mytestdb=# delete from t where id=4;
DELETE 1
mytestdb=# select * from t;
 id |   name
----+----------
  1 | YDyVJpdA
  2 | K1jdRRsB
  3 | sY6ee2dr
  6 | 9CViaPkE
  7 | NFIsn97n
  8 | yTFpHkEu
  5 | xiugai
(7 rows)

mytestdb=#

生成数据字典

注意-D代表创建数据字典的文件名,-d代表数据库,-u代表用户名,-W代表密码

[postgres@postgres ~]$ walminer builtdic -D ~/walminer.dic -f -h localhost -p 8432 -d mytestdb -u fbase -W fbase
#################################################
Walminer for PostgreSQL wal
Contact Author by mail 'lchch1990@sina.cn'
No License for walminer test, for get a license you can read:
https://gitee.com/movead/XLogMiner/wikis/walminer%20license
#################################################

DIC INFO#
sysid:7397266172285952557 timeline:1 dbversion:160003 walminer:4.7
[postgres@postgres ~]$ 
[postgres@postgres ~]$ ls
walminer  walminer.dic
[postgres@postgres ~]$ 

解析并输出到数据库临时表

使用walminer wal2sql解析wal日志,-D指定数据字典,-w指定wal目录,-d代表数据库,-u代表用户名,-W代表密码,-t 3代表输出到数据库临时表walminer_contents中

[postgres@postgres ~]$ psql
psql (16.3)
Type "help" for help.

postgres=# show data_directory;
   data_directory   
--------------------
 /data/postgres/fbdata
(1 row)

postgres=# \q
[postgres@postgres ~]$ walminer wal2sql -D ~/walminer.dic -w /data/fbase/fbdata/pg_wal -t 3 -h localhost -p 8432 -d mytestdb -u postgres
#################################################
Walminer for PostgreSQL wal
Contact Author by mail 'lchch1990@sina.cn'
Persional License for greatfinish(375887@qq.com)
#################################################
NOTICE:  table "walminer_contents" does not exist, skipping
[WARNING][filter_in_decode]Can not find relfilenode 24589 in dic
[WARNING][filter_in_decode]Can not find relfilenode 24589 in dic
Switch wal to /data/postgres/fbdata/pg_wal/000000010000000000000004 on time 2024-07-31 14:27:56.910683+08
WALMINER_ERROR:Unsupport pg version ##这里如果使用小于16版本的PG就没问题了,不过这也能正常使用
[postgres@postgres ~]$ 

这时可以在客户端看到在数据库的walminer_contents临时表里,有大量的DML语句:
在这里插入图片描述

2.2、模拟突然有一段时间有大量的wal日志产生,排查原因

插入数据:

[postgres@postgres ~]$ psql
psql (16.3)
Type "help" for help.

postgres=# \c mytestdb 
You are now connected to database "mytestdb" as user "fbase".
mytestdb=# \d
               List of relations
 Schema |       Name        |   Type   | Owner 
--------+-------------------+----------+-------
 public | t                 | table    | fbase
 public | t_id_seq          | sequence | fbase
 public | walminer_contents | table    | fbase
(3 rows)

mytestdb=# create table t1 (id serial, name varchar(20));
CREATE TABLE
mytestdb=# insert into t1 (name) select random_string(20);
INSERT 0 1
mytestdb=# insert into t1 (name) select random_string(20);
INSERT 0 1
mytestdb=# insert into t1 (name) select random_string(20);
INSERT 0 1
mytestdb=# insert into t1 (name) select random_string(20);
INSERT 0 1
mytestdb=# insert into t1 (name) select random_string(20);
INSERT 0 1
mytestdb=# insert into t1 (name) select random_string(20);
INSERT 0 1
mytestdb=# insert into t1 (name) select random_string(20);
INSERT 0 1
mytestdb=# insert into t1 (name) select random_string(2000);
ERROR:  value too long for type character varying(20)
mytestdb=# insert into t1 (name) select random_string(200);
ERROR:  value too long for type character varying(20)
mytestdb=# insert into t1 (name) select random_string(20);
INSERT 0 1
mytestdb=# insert into t1 (name) select random_string(20);
INSERT 0 1
mytestdb=# insert into t1 (name) select random_string(20);
INSERT 0 1
mytestdb=# insert into t1 (name) select random_string(20);
INSERT 0 1
mytestdb=# insert into t1 (name) select random_string(20);
INSERT 0 1
mytestdb=# insert into t1 (name) select random_string(20);
INSERT 0 1
mytestdb=# insert into t1 select * from t1;
INSERT 0 13
mytestdb=# insert into t1 select * from t1;
INSERT 0 26
mytestdb=# insert into t1 select * from t1;
INSERT 0 52
mytestdb=# insert into t1 select * from t1;
INSERT 0 104
mytestdb=# insert into t1 select * from t1;
INSERT 0 208
mytestdb=# insert into t1 select * from t1;
INSERT 0 416
mytestdb=# insert into t1 select * from t1;
INSERT 0 832
mytestdb=# insert into t1 select * from t1;
INSERT 0 1664
mytestdb=# insert into t1 select * from t1;
INSERT 0 3328
mytestdb=# insert into t1 select * from t1;
INSERT 0 6656
mytestdb=# insert into t1 select * from t1;
INSERT 0 13312
mytestdb=# insert into t1 select * from t1;
INSERT 0 26624
mytestdb=# insert into t1 select * from t1;
INSERT 0 53248
mytestdb=# insert into t1 select * from t1;
INSERT 0 106496
mytestdb=# insert into t1 select * from t1;
INSERT 0 212992
mytestdb=# insert into t1 select * from t1;
^[[A^[[AINSERT 0 425984
mytestdb=# insert into t1 select * from t1;
INSERT 0 851968
mytestdb=# 
mytestdb=# select count(*) from t1;
  count  
---------
 1703936
(1 row)

mytestdb=# 

分析各时间段WAL日志量

mytestdb=# select to_char(date_trunc('day',wal.modification),'yyyymmdd') as day,
        sum(case when date_part('hour',wal.modification) >=0 and date_part('hour',wal.modification) <24 then 1 else 0 end) as all,
        sum(case when date_part('hour',wal.modification) >=0 and date_part('hour',wal.modification) <1 then 1 else 0 end) as w0_1,
        sum(case when date_part('hour',wal.modification) >=1 and date_part('hour',wal.modification) <2 then 1 else 0 end) as w1_2,
        sum(case when date_part('hour',wal.modification) >=2 and date_part('hour',wal.modification) <3 then 1 else 0 end) as w2_3,
        sum(case when date_part('hour',wal.modification) >=3 and date_part('hour',wal.modification) <4 then 1 else 0 end) as w3_4,
        sum(case when date_part('hour',wal.modification) >=4 and date_part('hour',wal.modification) <5 then 1 else 0 end) as w4_5,
        sum(case when date_part('hour',wal.modification) >=5 and date_part('hour',wal.modification) <6 then 1 else 0 end) as w5_6,
        sum(case when date_part('hour',wal.modification) >=6 and date_part('hour',wal.modification) <7 then 1 else 0 end) as w6_7,
        sum(case when date_part('hour',wal.modification) >=7 and date_part('hour',wal.modification) <8 then 1 else 0 end) as w7_8,
        sum(case when date_part('hour',wal.modification) >=8 and date_part('hour',wal.modification) <9 then 1 else 0 end) as w8_9,
        sum(case when date_part('hour',wal.modification) >=9 and date_part('hour',wal.modification) <10 then 1 else 0 end) as w9_10,
        sum(case when date_part('hour',wal.modification) >=10 and date_part('hour',wal.modification) <11 then 1 else 0 end) as w10_11,
        sum(case when date_part('hour',wal.modification) >=11 and date_part('hour',wal.modification) <12 then 1 else 0 end) as w11_12,
        sum(case when date_part('hour',wal.modification) >=12 and date_part('hour',wal.modification) <13 then 1 else 0 end) as w12_13,
        sum(case when date_part('hour',wal.modification) >=13 and date_part('hour',wal.modification) <14 then 1 else 0 end) as w13_14,
        sum(case when date_part('hour',wal.modification) >=14 and date_part('hour',wal.modification) <15 then 1 else 0 end) as w14_15,
        sum(case when date_part('hour',wal.modification) >=15 and date_part('hour',wal.modification) <16 then 1 else 0 end) as w15_16,
        sum(case when date_part('hour',wal.modification) >=16 and date_part('hour',wal.modification) <17 then 1 else 0 end) as w16_17,
        sum(case when date_part('hour',wal.modification) >=17 and date_part('hour',wal.modification) <18 then 1 else 0 end) as w17_18,
        sum(case when date_part('hour',wal.modification) >=18 and date_part('hour',wal.modification) <19 then 1 else 0 end) as w18_19,
        sum(case when date_part('hour',wal.modification) >=19 and date_part('hour',wal.modification) <20 then 1 else 0 end) as w19_20,
        sum(case when date_part('hour',wal.modification) >=20 and date_part('hour',wal.modification) <21 then 1 else 0 end) as w20_21,
        sum(case when date_part('hour',wal.modification) >=21 and date_part('hour',wal.modification) <22 then 1 else 0 end) as w21_22,
        sum(case when date_part('hour',wal.modification) >=22 and date_part('hour',wal.modification) <23 then 1 else 0 end) as w22_23,
        sum(case when date_part('hour',wal.modification) >=23 and date_part('hour',wal.modification) <24 then 1 else 0 end) as w23_24
from (select * from  pg_ls_waldir()) wal
where wal.name not in ('archive_status')
  and wal.name not like '%.backup'
group by to_char(date_trunc('day',wal.modification),'yyyymmdd')
order by to_char(date_trunc('day',wal.modification),'yyyymmdd') desc;

-[ RECORD 1 ]----
day    | 20240731
all    | 10
w0_1   | 0
w1_2   | 0
w2_3   | 0
w3_4   | 0
w4_5   | 0
w5_6   | 0
w6_7   | 0
w7_8   | 0
w8_9   | 0
w9_10  | 0
w10_11 | 0
w11_12 | 0
w12_13 | 0
w13_14 | 0
w14_15 | 10
w15_16 | 0
w16_17 | 0
w17_18 | 0
w18_19 | 0
w19_20 | 0
w20_21 | 0
w21_22 | 0
w22_23 | 0
w23_24 | 0
-[ RECORD 2 ]----
day    | 20240730
all    | 2
w0_1   | 0
w1_2   | 0
w2_3   | 0
w3_4   | 0
w4_5   | 0
w5_6   | 0
w6_7   | 0
w7_8   | 0
w8_9   | 0
w9_10  | 0
w10_11 | 0
w11_12 | 2
w12_13 | 0
w13_14 | 0
w14_15 | 0
w15_16 | 0
w16_17 | 0
w17_18 | 0
w18_19 | 0
w19_20 | 0
w20_21 | 0
w21_22 | 0
w22_23 | 0
w23_24 | 0

然后就是生成数据字典,解析wal日志,根据数据库临时表walminer_contents查看分析结果
最终我们可以查询14点这一个小时的日志内容,发现有大量t1表的插入操作,所以产生大量的WAL日志,解决问题。

三、案例演示:恢复drop/truncate操作(DDL)数据库表的操作

布置测试环境

test=# create table t4 (id int);
CREATE TABLE
test=# insert into t4 values(1111111);
INSERT 0 1
test=# select * from t4;
   id    
---------
 1111111
(1 row)

test=# select oid,relfilenode,relname from pg_class where relname ='t4';
  oid  | relfilenode | relname 
-------+-------------+---------
 24624 |       24624 | t4
(1 row)

test=# drop table t4 ;
DROP TABLE

-- 创建替身表(表结构等字段需一致)
test=# create table t5 (id int);
CREATE TABLE
test=# select pg_switch_wal();
 pg_switch_wal 
---------------
 0/C8EFA50
(1 row)

test=# 

生成新的数据字典,执行解析,数据恢复操作


# drop操作测试
## 创建测试表
postgres=# create table t4 (id int);
CREATE TABLE
postgres=# insert into t4 values(1111111);
INSERT 0 1
postgres=# select * from t4;
   id    
---------
 1111111
(1 row)

postgres=# select oid,relfilenode,relname from pg_class where relname ='t4';
  oid  | relfilenode | relname 
-------+-------------+---------
 16522 |       16522 | t4
(1 row)

postgres=# drop table t4 ;
DROP TABLE

## 创建替身表(表结构等字段需一致)
postgres=# create table t5 (id int);
CREATE TABLE
postgres=# select pg_switch_wal();
 pg_switch_wal 
---------------
 0/11035678
(1 row)

## 生成新的数据字典
[postgres@pg ~]$ walminer builtdic -D ~/walminer/walminer.dic -f -h localhost -p 5432 -u postgres
#################################################
Walminer for PostgreSQL wal
Contact Author by mail 'lchch1990@sina.cn'
No License for walminer test, for get a license you can read:
https://gitee.com/movead/XLogMiner/wikis/walminer%20license
#################################################
DIC INFO#
sysid:7301135029075117750 dboid:5 timeline:1 dbversion:150004 walminer:4.4

## 执行替身命令
[postgres@pg ~]$ walminer avatardic -D ~/walminer/walminer.dic  -r 't5' -n 16522
#################################################
Walminer for PostgreSQL wal
Contact Author by mail 'lchch1990@sina.cn'
No License for walminer test, for get a license you can read:
https://gitee.com/movead/XLogMiner/wikis/walminer%20license
#################################################
Avatar rel t5 to relfilenode 16522

## 执行解析
[postgres@pg ~]$ walminer wal2sql -D ~/walminer/walminer.dic -w /pgsql15.4/pg_arch -t 3 -h localhost -u postgres -p 5432

postgres=# select * from walminer_contents ;
-[ RECORD 8 ]-----------------------------------------------------------------
sqlno      | 1
xid        | 868
topxid     | 0
sqlkind    | INSERT
minerd     | t
timestamp  | 2023-11-23 17:08:54.77116+08
op_text    | INSERT INTO public.t5(id) VALUES(1*****1)
undo_text  | DELETE FROM public.t5 WHERE id=1*****1
complete   | t
relation   | t5
start_lsn  | 0/110323D8
commit_lsn | 0/11032418


# truncate操作测试
## 创建测试表
postgres=# create table t4 (id int);
CREATE TABLE
postgres=# insert into t4 values (11111111);
INSERT 0 1
postgres=# select * from t4;
    id    
----------
 11111111
(1 row)
postgres=# truncate table t4;
TRUNCATE TABLE
postgres=# select * from t4;
 id 
----
(0 rows)

## 查看t4表信息,并创建替身表
postgres=# select oid,relfilenode from pg_class where relname='t4';
  oid  | relfilenode 
-------+-------------
 16561 |       16567
(1 row)
postgres=# create table t5 (id int);
CREATE TABLE

## 生成新的数据字典
[postgres@pg ~]$ walminer builtdic -D ~/walminer/walminer.dic -f -h localhost -p 5432 -u postgres
#################################################
Walminer for PostgreSQL wal
Contact Author by mail 'lchch1990@sina.cn'
No License for walminer test, for get a license you can read:
https://gitee.com/movead/XLogMiner/wikis/walminer%20license
#################################################
DIC INFO#
sysid:7301135029075117750 dboid:5 timeline:1 dbversion:150004 walminer:4.4

## 执行替身命令
[postgres@pg ~]$ walminer avatardic -D ~/walminer/walminer.dic  -r 't5' -n 16561(此为oid)
#################################################
Walminer for PostgreSQL wal
Contact Author by mail 'lchch1990@sina.cn'
No License for walminer test, for get a license you can read:
https://gitee.com/movead/XLogMiner/wikis/walminer%20license
#################################################
Avatar rel t5 to relfilenode 16561

## 执行解析
[postgres@pg ~]$ walminer wal2sql -D ~/walminer/walminer.dic -w /pgsql15.4/pg_arch -t 3 -h localhost -u postgres -p 5432

postgres=# select * from walminer_contents ;
-[ RECORD 1 ]------------------------------------------
sqlno      | 1
xid        | 922
topxid     | 0
sqlkind    | INSERT
minerd     | t
timestamp  | 2023-11-24 10:27:39.794824+08
op_text    | INSERT INTO public.t5(id) VALUES(11111111)
undo_text  | DELETE FROM public.t5 WHERE id=11111111
complete   | t
relation   | t5
start_lsn  | 0/170389F0
commit_lsn | 0/17038A30

## 通过op_text,进行恢复
postgres=# INSERT INTO public.t4(id) VALUES(11111111);
INSERT 0 1
postgres=# select * from t4;
    id    
----------
 11111111
(1 row)

参考文档:

https://blog.csdn.net/loveLAxin/article/details/134599626
https://www.modb.pro/db/1739559650075435008

  • 30
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值