PostgreSQL最佳实践
前置条件
-
PostgreSQL 版本 >= 9.0
-
创建监控账号
-- PostgreSQL >= 10 create user datakit with password '<PASSWORD>'; grant pg_monitor to datakit; grant SELECT ON pg_stat_database to datakit; -- PostgreSQL < 10 create user datakit with password '<PASSWORD>'; grant SELECT ON pg_stat_database to datakit;
采集器配置
主机采集postgreSql
进入Datakit安装目录下的 conf.d/db
目录,复制 postgresql.conf.sample
并命名为 postgresql.conf
注意:address定义了连接到 PostgreSQL 服务器的 URI,这里的 datakit
是用户名,PASSWORD
应被替换为实际的密码,localhost
是数据库服务器地址,sslmode=disable
表示不使用 SSL 连接
postgresql_lock
, postgresql_stat
, postgresql_index
, postgresql_size
, postgresql_statio
这些指标需要开启配置文件中的 relations
字段
[[inputs.postgresql]]
## Server address
# URI format
# postgres://[datakit[:PASSWORD]]@localhost[/dbname]?sslmode=[disable|verify-ca|verify-full]
# or simple string
# host=localhost user=pqgotest password=... sslmode=... dbname=app_production
address = "postgres://datakit:PASSWORD@localhost?sslmode=disable"
## Ignore databases which are gathered. Do not use with 'databases' option.
#
# ignored_databases = ["db1"]
## Specify the list of the databases to be gathered. Do not use with the 'ignored_databases' option.
#
# databases = ["db1"]
## Specify the name used as the "server" tag.
#
# outputaddress = "db01"
## Collect interval
# Time unit: "ns", "us" (or "µs"), "ms", "s", "m", "h"
#
interval = "10s"
## Relations config
# The list of relations/tables can be specified to track per-relation metrics. To collect relation
# relation_name refer to the name of a relation, either relation_name or relation_regex must be set.
# relation_regex is a regex rule, only takes effect when relation_name is not set.
# schemas used for filtering, ignore this field when it is empty
# relkind can be a list of the following options:
# r(ordinary table), i(index), S(sequence), t(TOAST table), p(partitioned table),
# m(materialized view), c(composite type), f(foreign table)
#
# [[inputs.postgresql.relations]]
# relation_name = "<TABLE_NAME>"
# relation_regex = "<TABLE_PATTERN>"
# schemas = ["public"]
# relkind = ["r", "p"]
## Set true to enable election
election = true
## Run a custom SQL query and collect corresponding metrics.
#
# [[inputs.postgresql.custom_queries]]
# sql = '''
# select datname,numbackends,blks_read
# from pg_stat_database
# limit 10
# '''
# metric = "postgresql_custom_stat"
# tags = ["datname" ]
# fields = ["numbackends", "blks_read"]
## Log collection
#
# [inputs.postgresql.log]
# files = []
# pipeline = "postgresql.p"
## Custom tags
#
[inputs.postgresql.tags]
# some_tag = "some_value"
# more_tag = "some_other_value"
# ...
配置好后重启Datakit
K8s集群接入postgreSql
使用CM挂载的方式开启PostgreSQL采集器,配置datakit.yaml
在volumeMounts处配置如下
-mountPath:/usr/local/datakit/conf.d/db/postgresql.conf
name: datakit-conf
subPath: postgresql.conf
readOnly: true
datakit.yaml底部配置CM
apiVersion: v1
kind: ConfigMap
metadata:
name: datakit-conf
namespace: datakit
data:
postgresql.conf: |-
[[inputs.postgresql]]
address = "postgres://username:xxxxx@postgresql.default.svc.cluster.local?sslmode=disable"
interval = "10s"
[[inputs.postgresql.relations]]
#relation_name = "<TABLE_NAME>"
relation_regex = ".*"
schemas = ["public"]
relkind = ["r", "p"]
election = true
[inputs.postgresql.log]
files = []
#pipeline = "postgresql.p"
---
kubectl apply -f datakit.yaml
验证
接入datakit后可以通过datakit monitor命令查看数据是否正常上报
日志
- PostgreSQL 日志默认是输出至
stderr
,如需开启文件日志,可在 PostgreSQL 的配置文件 /var/lib/pgsql/15/data/postgresql.conf` , 进行如下配置:
logging_collector = on # 开启日志写入文件功能
log_directory = 'pg_log' # 设置文件存放目录,绝对路径或相对路径(相对 PGDATA)
log_filename = 'pg.log' # 日志文件名称
log_statement = 'all' # 记录所有查询
#log_duration = on
log_line_prefix= '%m [%p] %d [%a] %u [%h] %c ' # 日志行前缀
log_file_mode = 0644
# For Windows
#log_destination = 'eventlog'
- PostgreSQL 采集器默认是未开启日志采集功能,可在 conf.d/db/postgresql.conf 中 将
files
打开,并写入 PostgreSQL 日志文件的绝对路径。比如:
[[inputs.postgresql]]
...
[inputs.postgresql.log]
files = ["/tmp/pgsql/postgresql.log"]
开启日志采集后,默认会产生日志来源(source
)为 postgresql 的日志
注意:日志采集仅支持已安装 DataKit 主机上的日志。
其中需要在/usr/local/datakit/conf.d/postgresql.conf部分配置
multiline_match = '''^\S'''
auto_multiline_detection = true
auto_multiline_extra_patterns = []
将日志合并多行
日志 Pipeline 切割
pipeline规则:
add_pattern("application_name", "\\[\\w+ \\d+ - DB:\\w+\\]")
grok(_, '%{TIMESTAMP_ISO8601:time} %{NOTSPACE:timezone} \\[%{NOTSPACE:process_id}\\] %{NOTSPACE:db_name} %{application_name:application_name} %{NOTSPACE:user} \\[%{NOTSPACE:localhost}\\] %{NOTSPACE:session_id} ')
default_time(time) # 将 time 字段作为输出数据的时间戳
日志原文:
2021-05-31 15:23:45.110 CST [74305] test [pgAdmin 4 - DB:postgres] postgres [127.0.0.1] 60b48f01.12241 LOG: statement:
SELECT psd.*, 2^31 - age(datfrozenxid) as wraparound, pg_database_size(psd.datname) as pg_database_size
FROM pg_stat_database psd
JOIN pg_database pd ON psd.datname = pd.datname
WHERE psd.datname not ilike 'template%' AND psd.datname not ilike 'rdsadmin'
AND psd.datname not ilike 'azure_maintenance' AND psd.datname not ilike 'postgres'
切割效果