linux 监控微信通知,Prometheus + Altermanager实现告警微信通知

最新推荐文章于 2023-11-05 15:41:24 发布

weixin_39722759

最新推荐文章于 2023-11-05 15:41:24 发布

阅读量361

点赞数

文章标签： linux 监控微信通知

Prometheus报警规则设置

进入Prometheus目录，设置rules规则，报警规则文件 /usr/local/prometheus/alert-rules-base.yml

groups:

- name: monitor_base

rules:

- alert: CpuUsageAlert_waring

expr: sum(avg(irate(node_cpu_seconds_total{mode!='idle'}[5m])) without (cpu)) by (instance) > 0.60

for: 2m

labels:

level: warning

annotations:

summary: "Instance {{ $labels.instance }} CPU usage high"

description: "{{ $labels.instance }} CPU usage above 60% (current value: {{ $value }})"

- alert: CpuUsageAlert_serious

#expr: sum(avg(irate(node_cpu_seconds_total{mode!='idle'}[5m])) without (cpu)) by (instance) > 0.85

expr: (100 - (avg by (instance) (irate(node_cpu_seconds_total{job=~".*",mode="idle"}[5m])) * 100)) > 85

for: 3m

labels:

level: serious

annotations:

summary: "Instance {{ $labels.instance }} CPU usage high"

description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})"

- alert: MemUsageAlert_waring

expr: avg by(instance) ((1 - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes) / node_memory_MemTotal_bytes) * 100) > 70

for: 2m

labels:

level: warning

annotations:

summary: "Instance {{ $labels.instance }} MEM usage high"

description: "{{$labels.instance}}: MEM usage is above 70% (current value is: {{ $value }})"

- alert: MemUsageAlert_serious

expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes > 0.90

for: 3m

labels:

level: serious

annotations:

summary: "Instance {{ $labels.instance }} MEM usage high"

description: "{{ $labels.instance }} MEM usage above 90% (current value: {{ $value }})"

- alert: DiskUsageAlert_warning

expr: (1 - node_filesystem_free_bytes{fstype!="rootfs",mountpoint!="",mountpoint!~"/(run|var|sys|dev).*"} / node_filesystem_size_bytes) * 100 > 80

for: 2m

labels:

level: warning

annotations:

summary: "Instance {{ $labels.instance }} Disk usage high"

description: "{{$labels.instance}}: Disk usage is above 80% (current value is: {{ $value }})"

- alert: DiskUsageAlert_serious

expr: (1 - node_filesystem_free_bytes{fstype!="rootfs",mountpoint!="",mountpoint!~"/(run|var|sys|dev).*"} / node_filesystem_size_bytes) * 100 > 90

for: 3m

labels:

level: serious

annotations:

summary: "Instance {{ $labels.instance }} Disk usage high"

description: "{{$labels.instance}}: Disk usage is above 90% (current value is: {{ $value }})"

- alert: NodeFileDescriptorUsage

expr: avg by (instance) (node_filefd_allocated{} / node_filefd_maximum{}) * 100 > 60

for: 2m

labels:

level: warning

annotations:

summary: "Instance {{ $labels.instance }} File Descriptor usage high"

description: "{{$labels.instance}}: File Descriptor usage is above 60% (current value is: {{ $value }})"

- alert: NodeLoad15

expr: avg by (instance) (node_load15{}) > 80

for: 2m

labels:

level: warning

annotations:

summary: "Instance {{ $labels.instance }} Load15 usage high"

description: "{{$labels.instance}}: Load15 is above 80 (current value is: {{ $value }})"

- alert: NodeAgentStatus

expr: avg by (instance) (up{}) == 0

for: 2m

labels:

level: warning

annotations:

summary: "{{$labels.instance}}: has been down"

description: "{{$labels.instance}}: Node_Exporter Agent is down (current value is: {{ $value }})"

- alert: NodeProcsBlocked

expr: avg by (instance) (node_procs_blocked{}) > 10

for: 2m

labels:

level: warning

annotations:

summary: "Instance {{ $labels.instance }} Process Blocked usage high"

description: "{{$labels.instance}}: Node Blocked Procs detected! above 10 (current value is: {{ $value }})"

- alert: NetworkTransmitRate

#expr: avg by (instance) (floor(irate(node_network_transmit_bytes_total{device="ens192"}[2m]) / 1024 / 1024)) > 50

expr: avg by (instance) (floor(irate(node_network_transmit_bytes_total{}[2m]) / 1024 / 1024 * 8 )) > 40

for: 1m

labels:

level: warning

annotations:

summary: "Instance {{ $labels.instance }} Network Transmit Rate usage high"

description: "{{$labels.instance}}: Node Transmit Rate (Upload) is above 40Mbps/s (current value is: {{ $value }}Mbps/s)"

- alert: NetworkReceiveRate

#expr: avg by (instance) (floor(irate(node_network_receive_bytes_total{device="ens192"}[2m]) / 1024 / 1024)) > 50

expr: avg by (instance) (floor(irate(node_network_receive_bytes_total{}[2m]) / 1024 / 1024 * 8 )) > 40

for: 1m

labels:

level: warning

annotations:

summary: "Instance {{ $labels.instance }} Network Receive Rate usage high"

description: "{{$labels.instance}}: Node Receive Rate (Download) is above 40Mbps/s (current value is: {{ $value }}Mbps/s)"

- alert: DiskReadRate

expr: avg by (instance) (floor(irate(node_disk_read_bytes_total{}[2m]) / 1024 )) > 200

for: 2m

labels:

level: warning

annotations:

summary: "Instance {{ $labels.instance }} Disk Read Rate usage high"

description: "{{$labels.instance}}: Node Disk Read Rate is above 200KB/s (current value is: {{ $value }}KB/s)"

- alert: DiskWriteRate

expr: avg by (instance) (floor(irate(node_disk_written_bytes_total{}[2m]) / 1024 / 1024 )) > 20

for: 2m

labels:

level: warning

annotations:

summary: "Instance {{ $labels.instance }} Disk Write Rate usage high"

description: "{{$labels.instance}}: Node Disk Write Rate is above 20MB/s (current value is: {{ $value }}MB/s)"

Mysql的报警规则文件 /usr/local/prometheus/alert-rules-mysql.yml

groups:

- name: MySQLStatusAlert

rules:

- alert: MySQL is down

expr: mysql_up == 0

for: 1m

labels:

severity: critical

annotations:

summary: "Instance {{ $labels.instance }} MySQL is down"

description: "MySQL database is down. This requires immediate action!"

- alert: open files high

expr: mysql_global_status_innodb_num_open_files > (mysql_global_variables_open_files_limit) * 0.25

for: 1m

labels:

severity: warning

annotations:

summary: "Instance {{ $labels.instance }} open files high"

description: "Open files is high. Please consider increasing open_files_limit."

- alert: Read buffer size is bigger than max. allowed packet size

expr: mysql_global_variables_read_buffer_size > mysql_global_variables_slave_max_allowed_packet

for: 1m

labels:

severity: warning

annotations:

summary: "Instance {{ $labels.instance }} Read buffer size is bigger than max. allowed packet size"

description: "Read buffer size (read_buffer_size) is bigger than max. allowed packet size (max_allowed_packet).This can break your replication."

- alert: Sort buffer possibly missconfigured

expr: mysql_global_variables_innodb_sort_buffer_size < 256*1024 or mysql_global_variables_read_buffer_size > 4*1024*1024

for: 1m

labels:

severity: warning

annotations:

summary: "Instance {{ $labels.instance }} Sort buffer possibly missconfigured"

description: "Sort buffer size is either too big or too small. A good value for sort_buffer_size is between 256k and 4M."

- alert: Thread stack size is too small

expr: mysql_global_variables_thread_stack <196608

for: 1m

labels:

severity: warning

annotations:

summary: "Instance {{ $labels.instance }} Thread stack size is too small"

description: "Thread stack size is too small. This can cause problems when you use Stored Language constructs for example. A typical is 256k for thread_stack_size."

- alert: Used more than 70% of max connections limited

expr: mysql_global_status_max_used_connections > mysql_global_variables_max_connections * 0.7

for: 1m

labels:

severity: warning

annotations:

summary: "Instance {{ $labels.instance }} Used more than 70% of max connections limited"

description: "Used more than 70% of max connections limited"

- alert: InnoDB Force Recovery is enabled

expr: mysql_global_variables_innodb_force_recovery != 0

for: 1m

labels:

severity: warning

annotations:

summary: "Instance {{ $labels.instance }} InnoDB Force Recovery is enabled"

description: "InnoDB Force Recovery is enabled. This mode should be used for data recovery purposes only. It prohibits writing to the data."

- alert: InnoDB Log File size is too small

expr: mysql_global_variables_innodb_log_file_size < 16777216

for: 1m

labels:

severity: warning

annotations:

summary: "Instance {{ $labels.instance }} InnoDB Log File size is too small"

description: "The InnoDB Log File size is possibly too small. Choosing a small InnoDB Log File size can have significant performance impacts."

- alert: InnoDB Flush Log at Transaction Commit

expr: mysql_global_variables_innodb_flush_log_at_trx_commit != 1

for: 1m

labels:

severity: warning

annotations:

summary: "Instance {{ $labels.instance }} InnoDB Flush Log at Transaction Commit"

description: "InnoDB Flush Log at Transaction Commit is set to a values != 1. This can lead to a loss of commited transactions in case of a power failure."

- alert: Table definition cache too small

expr: mysql_global_status_open_table_definitions > mysql_global_variables_table_definition_cache

for: 1m

labels:

severity: page

annotations:

summary: "Instance {{ $labels.instance }} Table definition cache too small"

description: "Your Table Definition Cache is possibly too small. If it is much too small this can have significant performance impacts!"

# - alert: Table open cache too small

# expr: mysql_global_status_open_tables > mysql_global_variables_table_open_cache * 99/100

# for: 1m

# labels:

# severity: page

# annotations:

# summary: "Instance {{ $labels.instance }} Table open cache too small"

# description: "Your Table Open Cache is possibly too small (old name Table Cache). If it is much too small this can have significant performance impacts!"

- alert: Thread stack size is possibly too small

expr: mysql_global_variables_thread_stack < 262144

for: 1m

labels:

severity: page

annotations:

summary: "Instance {{ $labels.instance }} Thread stack size is possibly too small"

description: "Thread stack size is possibly too small. This can cause problems when you use Stored Language constructs for example. A typical is 256k for thread_stack_size."

# - alert: InnoDB Buffer Pool Instances is too small

# expr: mysql_global_variables_innodb_buffer_pool_instances == 1

# for: 1m

# labels:

# severity: page

# annotations:

# summary: "Instance {{ $labels.instance }} InnoDB Buffer Pool Instances is too small"

# description: "If you are using MySQL 5.5 and higher you should use several InnoDB Buffer Pool Instances for performance reasons. Some rules are: InnoDB Buffer Pool Instance should be at least 1 Gbyte in size. InnoDB Buffer Pool Instances you can set equal to the number of cores of your machine."

- alert: InnoDB Plugin is enabled

expr: mysql_global_variables_ignore_builtin_innodb == 1

for: 1m

labels:

severity: page

annotations:

summary: "Instance {{ $labels.instance }} InnoDB Plugin is enabled"

description: "InnoDB Plugin is enabled"

# - alert: Binary Log is disabled

# expr: mysql_global_variables_log_bin != 1

# for: 1m

# labels:

# severity: warning

# annotations:

# summary: "Instance {{ $labels.instance }} Binary Log is disabled"

# description: "Binary Log is disabled. This prohibits you to do Point in Time Recovery (PiTR)."

- alert: Binlog Cache size too small

expr: mysql_global_variables_binlog_cache_size < 1048576

for: 1m

labels:

severity: page

annotations:

summary: "Instance {{ $labels.instance }} Binlog Cache size too small"

description: "Binlog Cache size is possibly to small. A value of 1 Mbyte or higher is OK."

- alert: Binlog Statement Cache size too small

expr: mysql_global_variables_binlog_stmt_cache_size <1048576 and mysql_global_variables_binlog_stmt_cache_size > 0

for: 1m

labels:

severity: page

annotations:

summary: "Instance {{ $labels.instance }} Binlog Statement Cache size too small"

description: "Binlog Statement Cache size is possibly to small. A value of 1 Mbyte or higher is typically OK."

- alert: Binlog Transaction Cache size too small

expr: mysql_global_variables_binlog_cache_size < 1048576

for: 1m

labels:

severity: page

annotations:

summary: "Instance {{ $labels.instance }} Binlog Transaction Cache size too small"

description: "Binlog Transaction Cache size is possibly to small. A value of 1 Mbyte or higher is typically OK."

- alert: Sync Binlog is enabled

expr: mysql_global_variables_sync_binlog == 1

for: 1m

labels:

severity: page

annotations:

summary: "Instance {{ $labels.instance }} Sync Binlog is enabled"

description: "Sync Binlog is enabled. This leads to higher data security but on the cost of write performance."

# - alert: IO thread stopped

# expr: mysql_slave_status_slave_io_running != 1

# for: 1m

# labels:

# severity: critical

# annotations:

# summary: "Instance {{ $labels.instance }} IO thread stopped"

# description: "IO thread has stopped. This is usually because it cannot connect to the Master any more."

- alert: SQL thread stopped

expr: mysql_slave_status_slave_sql_running == 0

for: 1m

labels:

severity: critical

annotations:

summary: "Instance {{ $labels.instance }} SQL thread stopped"

description: "SQL thread has stopped. This is usually because it cannot apply a SQL statement received from the master."

- alert: SQL thread stopped

expr: mysql_slave_status_slave_sql_running != 1

for: 1m

labels:

severity: critical

annotations:

summary: "Instance {{ $labels.instance }} Sync Binlog is enabled"

description: "SQL thread has stopped. This is usually because it cannot apply a SQL statement received from the master."

- alert: Slave lagging behind Master

expr: rate(mysql_slave_status_seconds_behind_master[1m]) >30

for: 1m

labels:

severity: warning

annotations:

summary: "Instance {{ $labels.instance }} Slave lagging behind Master"

description: "Slave is lagging behind Master. Please check if Slave threads are running and if there are some performance issues!"

- alert: Slave is NOT read only(Please ignore this warning indicator.)

expr: mysql_global_variables_read_only != 0

for: 1m

labels:

severity: page

annotations:

summary: "Instance {{ $labels.instance }} Slave is NOT read only"

description: "Slave is NOT set to read only. You can accidentally manipulate data on the slave and get inconsistencies..."

weixin_39722759

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫