postgres慢sql日志的过滤重复后统计导出

IT杂人

已于 2023-07-26 11:19:41 修改

阅读量409

点赞数

分类专栏： postgresql 文章标签： sql 数据库 postgresql

于 2023-04-10 20:05:51 首次发布

本文链接：https://blog.csdn.net/atzqtzq/article/details/130067899

版权

postgresql 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

postgres日志为csv文件，所以在导出后，其实是表格，但也存在很多一样的sql，可能只是参数同，需要导出所有sql进行统计分析

用于pg的一些函数

regexp_matches  字符串正则匹配，默认输出数组
regexp_replace  字符串正则替换
REGEXP_SPLIT_TO_ARRAY 正则输出数组
REGEXP_SPLIT_TO_TABLE 正则输出表
array_to_string 数据转字符串
cast(xxx as float) 字符串转数值

日志开启

进入文件postgresql.auto.conf

开启慢日志

log_min_duration_statement = 500
log_statement = ddl
log_duration = off

开启所有日志

log_min_duration_statement = 0
log_statement = all
log_duration = on
#按小时记录
log_filename = 'postgresql-%Y-%m-%d_%H.log' 
#默认单位分钟
log_rotation_age = 240

导出慢查询

从服务器导出慢查询sql的csv文件，可以导出一周的进行分析

本地开发环境创建库和表

创建日志库

# 切换用户
su - postgres

# 进入pg
psql -U postgres

# 创建库
create database pglog;

# 切换到pglog
\c pglog;

创建日志表

CREATE  TABLE postgres_log
(
  log_time timestamp(3) with time zone,
  user_name text,
  database_name text,
  process_id integer,
  connection_from text,
  session_id text,
  session_line_num  bigint,
  command_tag text,
  session_start_time timestamp with  time zone,
  virtual_transaction_id text,
  transaction_id bigint,
  error_severity text,
  sql_state_code text,
  message text,
  detail text,
  hint text,
  internal_query text,
  internal_query_pos integer,
  context text,
  query text,
  query_pos integer,
  location text,
  application_name text,
  PRIMARY KEY (session_id, session_line_num)
);

本地库导入数据

上传从远程服务器上导出csv文件，到本地服务器上
进入数据库命令行

# 切换用户
su - postgres

# 进入pg
psql -U postgres

# 切换数据库
\c pglog 

# 导入数据
\COPY postgres_log FROM '/home/pglog/postgresql-log.Tue.csv' with CSV;

# 导入数据2
# 方便集成到sh脚本中的操作
psql -U postgres -d pglog -c "COPY postgres_log FROM '/home/vcs/postgresql-log.Tue.csv' with CSV;"

注意：

在导入时，可能最后一行因为列不完成，会出错，可不用管，其他数据都已导入；
从Postgres 9.4开始，您现在可以使用FORCE NULL。这会导致将空字符串转换为NULL。非常方便，尤其是CSV文件。
语法如下：COPY table FROM stdin WITH DELIMITER’;’ CSV FORCE NULL integer fieldname;

分析一：过滤重复慢sql后导出

上面的操作已经把慢日志导入到本地postgres_log 表

正则替换掉一些影响分组的字符串：时间，S编号，自带的的where条件值

# duration超过1秒的分析值，min,max,avg
#where条件，根据自己的需求动态修改
select max(database_name) as database_name,count(*), min(duration) as min_duration,max(duration) as max_duration,avg(duration) as avg_duration,max(message) as message,max(detail) as detail from 
(
select database_name,cast(array_to_string(regexp_matches(substring(message,10,10),'[0-9]+\.[0-9]+','g'),'') as float) as duration,regexp_replace(regexp_replace(message,'(duration: .* ms )|(S_[0-9]{1,5})','_'),'([''].*[''])','_')  as message ,detail  
from postgres_log_ls0530 pl where database_name ='要分析的数据名称' and   message  like 'duration%'  
) as a where duration>1000 group by message order by max_duration desc

导出结果集到html，csv文件

使用dbeaver==>全选结果集==>导出结果集==>选择html或csv文件

拿到上面的结果集，用于分析哪些sql需要优化

👆👆👆

分析二：常用日志分析sql

# 慢sql时间开始发生和最后时间
select  min(log_time),max(log_time) from postgres_log;

# 哪些库有慢sql
select database_name  from postgres_log pl  group by database_name;

# 每个库的统计
select database_name,count(*) as count  from postgres_log pl  group by database_name order by count desc;

分析三：除了慢sql，其他类别的sql

如：ERROR,

select * from postgres_log pl  where message  not like 'duration%'  ;