利用Docker, 5分钟搞定graphite+slack alert

利用了slack的Incoming WebHooks做报警机器人, 现在slack的notify不限次数,不知道以后会不会收费

Graphite安装

sudo docker run -d \
--name graphite \
--restart=always \
-p 80:80 \
-p 2003:2003 \
-p 8125:8125/udp \
hopsoft/graphite-statsd

默认安装

  • Nginx - reverse proxies the graphite dashboard
  • Graphite - front-end dashboard
  • Carbon - back-end
  • Statsd - UDP based back-end proxy

更多配置可以参见github:https://registry.hub.docker.com/u/hopsoft/graphite-statsd/

我装上了collectd,监控一些机器的资源/基础报警~

报警组件

docker run -v /path/to/config.json:/srv/alerting/etc/config.json deliverous/graphite-beacon

完整的配置sample参见

报警途径设置

{
    "critical_handlers": [
        "log",
        "slack"
    ],
    "normal_handlers": [
        "log",
        "smtp"
    ],
    "warning_handlers": [
        "log",
        "slack"
    ]
}

如果有方便的voip报警服务,也可以接进来~可以自己写notify plugin

下面是slack插件的写法

@gen.coroutine
    def notify(self, level, *args, **kwargs):
        LOGGER.debug("Handler (%s) %s", self.name, level)

        message = self.get_message(level, *args, **kwargs)
        data = dict()
        data['username'] = self.username
        data['text'] = message
        data['icon_emoji'] = self.emoji.get(level, ':warning:')
        if self.channel:
            data['channel'] = self.channel

        body = json.dumps(data)
        yield self.client.fetch(self.webhook, method='POST', body=body)

一个报警设置的示例(collectd收集到的CPU信息设置报警):

{
    "name": "CPU",
    "format": "percent",
    "rules": [
        "critical:  = 10%",
        "warning: = 20%"
    ],
    "interval": "5minute",
    "no_data": "warning",
    "source": "graphite",
    "query": "aliasByNode(sumSeriesWithWildcards(viila.collectd.*.cpu-*.cpu-user, 2), 1)"
}

报警展示

slack接收到报警的图:
QQ20150804-1

该框架缺点:
- 动态load config
- 没有web interface(好解决)
- 集群化

阅读更多

没有更多推荐了,返回首页