kube-prometheus 是一整套监控解决方案,它使用 Prometheus 采集集群指标,Grafana 做展示,包含如下组件:
- The Prometheus Operator
- Highly available Prometheus
- Highly available Alertmanager
- Prometheus node-exporter
- Prometheus Adapter for Kubernetes Metrics APIs (k8s-prometheus-adapter)
- kube-state-metrics
- Grafana
其中 k8s-prometheus-adapter 使用 Prometheus 实现了 metrics.k8s.io 和 custom.metrics.k8s.io API,所以不需要再部署 metrics-server
。 如果要单独部署 metrics-server
,请参考:C.metrics-server插件.md
如果没有特殊指明,本文档的所有操作均在 k8s-01 节点上执行;
一、 下载和安装
原本是要把Registry的地址换成微软中国的地址的,但是发现貌似从2020年4月份开始就用不了了,网上也是有很多的网友反映,暂时未能找到可以替代的镜像仓库地址。
解决方法:
我的办法是在一台能科学上网的服务器上把需要用到的镜像先pull下来,然后推送到阿里云的镜像仓库中,在安装的时候,在三台主机(k8s-01、k8s-02、k8s-03)上将阿里云镜像仓库中的相关镜像pull下来,然后再改一下tag就行了。为了保证所需镜像的版本保持不变,我已经将https://github.com/coreos/kube-prometheus.git fork到了自己的github中,地址为https://github.com/wangchaoforever/kube-prometheus.git
$ cd /opt/k8s/work
$ git clone https://github.com/wangchaoforever/kube-prometheus.git
# 拉取镜像
docker pull registry.cn-hangzhou.aliyuncs.com/wc181/alertmanager:v0.20.0
docker pull registry.cn-hangzhou.aliyuncs.com/wc181/grafana:6.6.0
docker pull registry.cn-hangzhou.aliyuncs.com/wc181/kube-state-metrics:v1.9.5
docker pull registry.cn-hangzhou.aliyuncs.com/wc181/kube-rbac-proxy:v0.4.1
docker pull registry.cn-hangzhou.aliyuncs.com/wc181/node-exporter:v0.18.1
docker pull registry.cn-hangzhou.aliyuncs.com/wc181/k8s-prometheus-adapter-amd64:v0.5.0
docker pull registry.cn-hangzhou.aliyuncs.com/wc181/prometheus:v2.15.2
docker pull registry.cn-hangzhou.aliyuncs.com/wc181/prometheus-operator:v0.38.1
docker pull registry.cn-hangzhou.aliyuncs.com/wc181/configmap-reload:v0.3.0
docker pull registry.cn-hangzhou.aliyuncs.com/wc181/prometheus-config-reloader:v0.38.1
docker pull registry.cn-hangzhou.aliyuncs.com/wc181/pause:3.1
# 修改tag
docker tag registry.cn-hangzhou.aliyuncs.com/wc181/alertmanager:v0.20.0 quay.io/prometheus/alertmanager:v0.20.0
docker tag registry.cn-hangzhou.aliyuncs.com/wc181/grafana:6.6.0 grafana/grafana:6.6.0
docker tag registry.cn-hangzhou.aliyuncs.com/wc181/kube-state-metrics:v1.9.5 quay.io/coreos/kube-state-metrics:v1.9.5
docker tag registry.cn-hangzhou.aliyuncs.com/wc181/kube-rbac-proxy:v0.4.1 quay.io/coreos/kube-rbac-proxy:v0.4.1
docker tag registry.cn-hangzhou.aliyuncs.com/wc181/node-exporter:v0.18.1 quay.io/prometheus/node-exporter:v0.18.1
docker tag registry.cn-hangzhou.aliyuncs.com/wc181/k8s-prometheus-adapter-amd64:v0.5.0 quay.io/coreos/k8s-prometheus-adapter-amd64:v0.5.0
docker tag registry.cn-hangzhou.aliyuncs.com/wc181/prometheus:v2.15.2 quay.io/prometheus/prometheus:v2.15.2
docker tag registry.cn-hangzhou.aliyuncs.com/wc181/prometheus-operator:v0.38.1 quay.io/coreos/prometheus-operator:v0.38.1
docker tag registry.cn-hangzhou.aliyuncs.com/wc181/configmap-reload:v0.3.0 jimmidyson/configmap-reload:v0.3.0
docker tag registry.cn-hangzhou.aliyuncs.com/wc181/prometheus-config-reloader:v0.38.1 quay.io/coreos/prometheus-config-reloader:v0.38.1
docker tag registry.cn-hangzhou.aliyuncs.com/wc181/pause:3.1 k8s.gcr.io/pause:3.1
# 删除多余镜像
docker rmi -f registry.cn-hangzhou.aliyuncs.com/wc181/alertmanager:v0.20.0
docker rmi -f registry.cn-hangzhou.aliyuncs.com/wc181/grafana:6.6.0
docker rmi -f registry.cn-hangzhou.aliyuncs.com/wc181/kube-state-metrics:v1.9.5
docker rmi -f registry.cn-hangzhou.aliyuncs.com/wc181/kube-rbac-proxy:v0.4.1
docker rmi -f registry.cn-hangzhou.aliyuncs.com/wc181/node-exporter:v0.18.1
docker rmi -f registry.cn-hangzhou.aliyuncs.com/wc181/k8s-prometheus-adapter-amd64:v0.5.0
docker rmi -f registry.cn-hangzhou.aliyuncs.com/wc181/prometheus:v2.15.2
docker rmi -f registry.cn-hangzhou.aliyuncs.com/wc181/prometheus-operator:v0.38.1
docker rmi -f registry.cn-hangzhou.aliyuncs.com/wc181/configmap-reload:v0.3.0
docker rmi -f registry.cn-hangzhou.aliyuncs.com/wc181/prometheus-config-reloader:v0.38.1
docker rmi -f registry.cn-hangzhou.aliyuncs.com/wc181/pause:3.1
$ cd kube-prometheus/
# 安装 prometheus-operator
$ kubectl apply -f manifests/setup
# 安装 promethes metric adapter
$ kubectl apply -f manifests/
二、查看运行状态
$ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 2m18s
alertmanager-main-1 2/2 Running 0 2m18s
alertmanager-main-2 2/2 Running 0 2m18s
grafana-86b55cb79f-dg9wf 1/1 Running 0 2m12s
kube-state-metrics-dbb85dfd5-27p9q 3/3 Running 0 2m12s
node-exporter-fgjcj 2/2 Running 0 2m12s
node-exporter-rrhmv 2/2 Running 0 2m12s
node-exporter-wpz9p 2/2 Running 0 2m12s
prometheus-adapter-5cd5798d96-b42b7 1/1 Running 0 2m13s
prometheus-k8s-0 3/3 Running 1 2m10s
prometheus-k8s-1 3/3 Running 1 2m10s
prometheus-operator-5cfbdc9b67-hbrfh 2/2 Running 0 2m43s
$ kubectl top pods -n monitoring
NAME CPU(cores) MEMORY(bytes)
alertmanager-main-0 6m 23Mi
alertmanager-main-1 5m 19Mi
grafana-86b55cb79f-dg9wf 16m 27Mi
kube-state-metrics-dbb85dfd5-27p9q 1m 28Mi
node-exporter-fgjcj 2m 26Mi
node-exporter-wpz9p 3m 17Mi
prometheus-adapter-5cd5798d96-b42b7 0m 17Mi
prometheus-k8s-0 30m 323Mi
三、访问 Prometheus UI
启动服务代理:
$ kubectl port-forward --address 0.0.0.0 pod/prometheus-k8s-0 -n monitoring 9090:9090
Forwarding from 0.0.0.0:9090 -> 9090
- port-forward 依赖 socat。
浏览器访问:http://192.168.0.71:9090/new/graph?g0.expr=&g0.tab=1&g0.stacked=0&g0.range_input=1h
四、访问 Grafana UI
启动代理:
[root@k8s-01 ~]# kubectl port-forward --address 0.0.0.0 svc/grafana -n monitoring 3000:3000
Forwarding from 0.0.0.0:3000 -> 3000
浏览器访问:http://192.168.0.71:3000/
用 admin/admin 登录:
然后就可以看到各种预定义的 dashboard 了: