利用了slack的Incoming WebHooks做报警机器人, 现在slack的notify不限次数,不知道以后会不会收费
Graphite安装
sudo docker run -d \
--name graphite \
--restart=always \
-p 80:80 \
-p 2003:2003 \
-p 8125:8125/udp \
hopsoft/graphite-statsd
默认安装
- Nginx - reverse proxies the graphite dashboard
- Graphite - front-end dashboard
- Carbon - back-end
- Statsd - UDP based back-end proxy
更多配置可以参见github:https://registry.hub.docker.com/u/hopsoft/graphite-statsd/
我装上了collectd,监控一些机器的资源/基础报警~
报警组件
docker run -v /path/to/config.json:/srv/alerting/etc/config.json deliverous/graphite-beacon
报警途径设置
{
"critical_handlers": [
"log",
"slack"
],
"normal_handlers": [
"log",
"smtp"
],
"warning_handlers": [
"log",
"slack"
]
}
如果有方便的voip报警服务,也可以接进来~可以自己写notify plugin
下面是slack插件的写法
@gen.coroutine
def notify(self, level, *args, **kwargs):
LOGGER.debug("Handler (%s) %s", self.name, level)
message = self.get_message(level, *args, **kwargs)
data = dict()
data['username'] = self.username
data['text'] = message
data['icon_emoji'] = self.emoji.get(level, ':warning:')
if self.channel:
data['channel'] = self.channel
body = json.dumps(data)
yield self.client.fetch(self.webhook, method='POST', body=body)
一个报警设置的示例(collectd收集到的CPU信息设置报警):
{
"name": "CPU",
"format": "percent",
"rules": [
"critical: = 10%",
"warning: = 20%"
],
"interval": "5minute",
"no_data": "warning",
"source": "graphite",
"query": "aliasByNode(sumSeriesWithWildcards(viila.collectd.*.cpu-*.cpu-user, 2), 1)"
}
报警展示
该框架缺点:
- 动态load config
- 没有web interface(好解决)
- 集群化