搭建k8s监控问题排查-9093: connect: connection refused

16 篇文章 0 订阅
11 篇文章 0 订阅

搭建k8s集群监控-Alertmanager问题处理

pod启动错误-CrashLoopBackOff

在这里插入图片描述
CrashLoopBackOff说明pod正常启动后有异常退出了

describe查看

在这里插入图片描述

Events:
  Type     Reason     Age                    From                   Message
  ----     ------     ----                   ----                   -------
  Normal   Scheduled  <unknown>              default-scheduler      Successfully assigned monitoring/alertmanager-main-0 to 192.168.6.11
  Normal   Pulled     23m                    kubelet, 192.168.6.11  Container image "quay.mirrors.ustc.edu.cn/prometheus/alertmanager:v0.21.0" already present on machine
  Normal   Created    23m                    kubelet, 192.168.6.11  Created container alertmanager
  Normal   Started    23m                    kubelet, 192.168.6.11  Started container alertmanager
  Normal   Pulled     23m                    kubelet, 192.168.6.11  Container image "quay.mirrors.ustc.edu.cn/prometheus-operator/prometheus-config-reloader:v0.47.0" already present on machine
  Normal   Created    23m                    kubelet, 192.168.6.11  Created container config-reloader
  Normal   Started    23m                    kubelet, 192.168.6.11  Started container config-reloader
  Warning  Unhealthy  23m (x6 over 23m)      kubelet, 192.168.6.11  Liveness probe failed: Get http://172.17.25.5:9093/-/healthy: dial tcp 172.17.25.5:9093: connect: connection refused
  Warning  Unhealthy  8m53s (x148 over 23m)  kubelet, 192.168.6.11  Readiness probe failed: Get http://172.17.25.5:9093/-/ready: dial tcp 172.17.25.5:9093: connect: connection refused
  Warning  BackOff    3m51s (x34 over 12m)   kubelet, 192.168.6.11  Back-off restarting failed container

pod活性探测失败,无法连接,遭到拒绝

查看日志

在这里插入图片描述

[root@k8s-node1 ~]#  kubectl logs pod/alertmanager-main-0 alertmanager -n monitoring
level=info ts=2021-06-02T02:11:49.274Z caller=main.go:216 msg="Starting Alertmanager" version="(version=0.21.0, branch=HEAD, revision=4c6c03ebfe21009c546e4d1e9b92c371d67c021d)"
level=info ts=2021-06-02T02:11:49.274Z caller=main.go:217 build_context="(go=go1.14.4, user=root@dee35927357f, date=20200617-08:54:02)"
[root@k8s-node1 ~]# kubectl logs pod/alertmanager-main-0 config-reloader -n monitoring
level=info ts=2021-06-02T01:57:31.669430944Z caller=main.go:147 msg="Starting prometheus-config-reloader" version="(version=0.47.0, branch=refs/tags/pkg/client/v0.47.0, revision=539108b043e9ecc53c4e044083651e2ebfbd3492)"
level=info ts=2021-06-02T01:57:31.669531061Z caller=main.go:148 build_context="(go=go1.16.3, user=simonpasquier, date=20210413-15:46:43)"
level=info ts=2021-06-02T01:57:31.669664237Z caller=main.go:182 msg="Starting web server for metrics" listen=:8080
level=info ts=2021-06-02T01:57:31.67010267Z caller=reloader.go:214 msg="started watching config file and directories for changes" cfg= out= dirs=/etc/alertmanager/config
level=error ts=2021-06-02T01:57:32.81121586Z caller=runutil.go:101 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://localhost:9093/-/reload\": dial tcp 127.0.0.1:9093: connect: connection refused"
level=error ts=2021-06-02T01:57:37.811710125Z caller=runutil.go:101 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://localhost:9093/-/reload\": dial tcp 127.0.0.1:9093: connect: connection refused"
level=error ts=2021-06-02T01:57:42.811117367Z caller=runutil.go:101 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://localhost:9093/-/reload\": dial tcp 127.0.0.1:9093: connect: connection refused"
level=error ts=2021-06-02T01:57:47.810889541Z caller=runutil.go:101 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://localhost:9093/-/reload\": dial tcp 127.0.0.1:9093: connect: connection refused"

查看statefulset

在这里插入图片描述
发现alertmanager-main没有正常准备

导出statefulset

kubectl -n monitoring get statefulset.apps/alertmanager-main -o yaml > dump.yaml  
# spec.template.spec添加hostNetwork: true
# 删除原有的statefulset,重新创建
kubectl delete statefulsets.apps alertmanager-main -n monitoring
kubectl apply -f  dump.yaml  

在这里插入图片描述

参考类容:
https://github.com/prometheus-operator/kube-prometheus/issues/653

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值