照着官方文档已经安装设置成功,也启动了一个busybox容器,让它一直崩溃重启来触发告警邮件和钉钉消息.发现无法收到邮件,也无法收到钉钉消息.
查看kuboard / alertmanager-main日志,可以看出邮件和钉钉消息发送报错了.
level=warn ts=2022-05-27T15:32:50.287Z caller=notify.go:723 component=dispatcher receiver=Critical integration=email[0] msg="Notify attempt failed, will retry later" attempts=1 err="'require_tls' is true (default) but \"smtp.exmail.qq.com:465\" does not advertise the STARTTLS extension"
level=error ts=2022-05-27T15:35:07.518Z caller=dispatch.go:310 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="Default/webhook[0]: notify retry canceled due to unrecoverable error after 1 attempts: unexpected status code 404: http://dingtalk:8060/dingtalk/Default-shzdp/send; Default/email[0]: notify retry canceled after 17 attempts: 'require_tls' is true (default) but \"smtp.exmail.qq.com:465\" does not advertise the STARTTLS extension"
邮件告警:
我用的QQ企业邮箱,一直发不了告警邮件,smtp.exmail.qq.com:465,分析报错日志后发现需要关闭TLS选项,邮件发送不成功问题就解决了.
钉钉告警:
level=error ts=2022-05-27T15:41:46.569Z caller=dispatch.go:310 component=dispatcher msg="Notify for alerts failed" num_alerts=2 err="Critical/webhook[0]: notify retry canceled due to unrecoverable error after 1 attempts: unexpected status code 404: http://dingtalk:8060/dingtalk/Critical-8p8nz/send"
日志中发现钉钉容器服务接口404了,我们在kuboard找到钉钉pod,进去查看日志.
level=info ts=2022-05-27T16:31:50.549Z caller=entry.go:26 component=web http_scheme=http http_proto=HTTP/1.1 http_method=POST remote_addr=100.100.196.227:34048 user_agent=Alertmanager/0.22.2 uri=http://dingtalk:8060/dingtalk/Default-shzdp/send resp_status=404 resp_bytes_length=19 resp_elapsed_ms=0.068677 msg="request complete"
level=warn ts=2022-05-27T16:31:50.549Z caller=dingtalk.go:75 component=web target=Default-shzdp msg="target not found"
果然有错误日志,看这个容器时哪个镜像,我们找到它的源代码看看啥情况.
在hub.docker.com找到该镜像,timonwong/prometheus-webhook-dingtalk Tags | Docker Hubhttps://hub.docker.com/r/timonwong/prometheus-webhook-dingtalk/tags在github找到该镜像源代码,定位到想要报错的文件行.
我们看readme.md看看这个东西到底是个啥.发现原来是该容器使用的配置文件,配置了钉钉发送消息的一些参数.
知道了是容器的配置文件,我们就应该进入dingtalk容器查看以下,这个文件对不对.
果然有这个文件,我们查看一下文件内容.
文件内容正确无误,此时我们该思考,配置文件没问题,为什么代码说找不到targets呢?初步猜测可能是文件没生效,容器先启动,后再kuboard上配置的access_token.
怎么解决呢?
重启dingtalk容器组即可.