翻译:减少烦人的磁盘告警

原文:Reduce Noise From Disk Space Alerts | Robust Perception

Reduce Noise From Disk Space Alerts
减少烦人的磁盘告警
Brian Brazil August 7, 2015

How often have you gotten alerted about disk space going over some threshold, only to discover it’ll be weeks or even months until the disk actually fills? Noisy alerts are bad alerts. The new predict_linear() function in Prometheus gives you a way to have a smarter, more useful alert.
你多久才会注意到磁盘空间超过某个阈值,同时发现直到磁盘真正填满告警会持续几周甚至几个月?烦人的告警是糟糕的告警。在普罗米修斯中,新的predict_linear()函数给你提供了一种更智能、更有效的告警。

Disk filling up is undesirable as many applications and utilities don’t deal well with being unable to make changes to files. A standard way to protect against this is to have alerts when a disk is filling up, and a human will fix the problem before it’s too late. Typically this is done based on simple thresholds such as 80%, 90% or 10GB left. This works when there are moderate spikes in disk usage and uniform usage across all your servers, but not so well when there’s very gradual growth or the growth is so fast that by the time you get the alert it’s too late to do something about it.
磁盘填充不是我们所希望的。因为许多应用程序和工具不善于处理无法对文件进行更改这种情况。防止这种情况发生,一种标准方法是,当磁盘被填满时发出告警。而管理员通常很晚才修复这个问题。一般的,基于简单的阈值,如80%、90%或10GB,并且磁盘使用中有适度的峰值,同时所有的服务器上都是统一的使用方式,这种告警配置就会起作用。但是,当服务器磁盘增长非常缓慢或速度太快时,得到告警的时候,已经来不及采取措施。

What if instead of a fixed threshold, you could alert if the disk was going to fill up in 4 hours time? The predict_linear() function in Prometheus allows you to do just that. It uses a linear regression over a period of time to predict what the value of a timeseries will be in the future. Here’s what it looks like:
假如不设置固定的阈值,如果磁盘在4小时内被填满,你能发出告警吗? Prometheus中的predict_linear()函数允许这样做。它使用线性回归来预测一个时间序列将在未来一段时间内的值。下面是它的一个例子:

ALERT DiskWillFillIn4Hours
  IF predict_linear(node_filesystem_free{job='node'}[1h], 4*3600) < 0
  FOR 5m
  LABELS {
    severity="page"
  }

A deeper look
从深层次看

Let’s look at this alert definition piece by piece.
让我们来看看这个告警的定义片段。

ALERT DiskWillFillIn4Hours

This is the start of the alert definition and where the name of the alert is set.
这是告警定义的开始,设置告警的名称。

IF predict_linear(node_filesystem_free{job='node'}[1h], 4*3600) < 0

This is the meat of the alert, the expression that’ll trigger a notification to the alertmanager. node_filesystem_free{job='node'}[1h] retrieves an hour worth of history. This is passed to predict_linear which uses it to predict 4 hours forwards, as there are 3600 seconds in an hour. < 0 is a filter that only returns values less than 0.
这段告警的关键内容。表达式将触发告警给alertmanager。node_filesystem_free{job='node'}[1h] 检索一个小时的数据。检索到的值被传给predict_linear函数。因为一个小时内有3600秒,predict_linear函数会预测未来四个小时的时间序列的变化值。< 0是一个过滤器,告诉服务器只返回小于0的值。

FOR 5m

This makes Prometheus wait for the alert to be true for 5 minutes before sending a notification. This helps avoid false positives from brief spikes and race conditions.
告诉Prometheus等待5分钟后发送告警。这有助于避免由于短暂的峰值和竞态条件而造成的误报。

LABELS {
    severity="page"
}

This sets an additional label on the alert called severity with the value page. This can be used to route the alert in the alertmaanger to your paging system, rather than having to individually list what alert goes where.
这将在告警级别上设置一个附加的标签,名为“严重性”。这可以用来将告警发送到您的分页系统,而不是单独列出告警的位置。

Putting it all together
把它放在一起

The above alert should be put in a file called node.rules.
上面的告警统一放在一个名为node.rules的文件中。

Add the rules file to your Prometheus configuration in prometheus.yml:
在prometheus.yml中添加规则文件到Prometheus配置文件中。

global:

# Paths are relative to the configuration file.
rule_files:
  - path/to/node.rules

scrape_configs:
  .
  .
  .

If you haven’t already done so, configure the alertmanager. Finally either restart or send a SIGHUP to Prometheus to reload it’s configuration.
如果您还没有这样做,请配置alertmanager。最后,要么重启,要么发送一个信号量给普罗米修斯,让它重新加载配置。

If you visit the /alerts endpoint on Prometheus, you will see your new alert. You can click on it for additional detail.
如果你访问普罗米修斯的/alerts页面,将看到新的告警。你可以点击它获得更多的细节。

转载于:https://my.oschina.net/u/2419022/blog/1162893

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值