prometheus监控kubernetes ingress-nginx
前提: 根据<<kubernetes ingress nginx部署>>文档部署好ingress-nginx
目的: 通过Prometheus来监控ingress-nginx性能。
环境介绍
[root@master-1 grafana]# kubectl get node
NAME STATUS ROLES AGE VERSION
master-1 Ready control-plane,master 7h17m v1.22.1
worker-1 Ready <none> 7h16m v1.22.1
[root@master-1 grafana]# kubectl get pod -n ingress-nginx
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create--1-654zx 0/1 Completed 0 7h
ingress-nginx-admission-patch--1-8xpsp 0/1 Completed 2 7h
ingress-nginx-controller-6d6c58c986-xwwfc 1/1 Running 0 169m
[root@master-1 grafana]# kubectl get svc -n ingress-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx-controller LoadBalancer 10.96.194.255 <pending> 80:31361/TCP,443:32098/TCP 7h1m
ingress-nginx-controller-admission ClusterIP 10.103.100.208 <none> 443/TCP 7h1m
需要对ingress-nginx-controller deployment 和 service进行打标
# kubectl edit deployment ingress-nginx-controller -n ingress-nginx
添加以下部分
...
ports:
- containerPort: 10254
name: prometheus
...
# kubectl edit service ingress-nginx-controller -n ingress-nginx
添加以下部分
...
metadata:
annotations:
prometheus.io/port: "10254"
prometheus.io/scrape: "true"
...
spec:
ports:
- name: prometheus
port: 10254
targetPort: prometheus
...
[root@master-1 grafana]# kubectl get svc -n ingress-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx-controller LoadBalancer 10.96.194.255 <pending> 10254:30406/TCP,:31361/TCP,443:32098/TCP 7h1m
ingress-nginx-controller-admission ClusterIP 10.103.100.208 <none> 443/TCP 7h1m
部署Prometheus
# kubectl apply --kustomize github.com/kubernetes/ingress-nginx/deploy/prometheus/
[root@master-1 grafana]# kubectl get svc -n ingress-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx-controller LoadBalancer 10.96.194.255 <pending> 10254:30406/TCP,80:31361/TCP,443:32098/TCP 7h6m
ingress-nginx-controller-admission ClusterIP 10.103.100.208 <none> 443/TCP 7h6m
prometheus-server NodePort 10.99.70.216 <none> 9090:30221/TCP 172m
[root@master-1 grafana]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master-1 Ready control-plane,master 7h39m v1.22.1 192.168.5.11 <none> CentOS Linux 7 (Core) 3.10.0-1160.49.1.el7.x86_64 docker://20.10.12
worker-1 Ready <none> 7h38m v1.22.1 192.168.5.21 <none> CentOS Linux 7 (Core) 3.10.0-1160.49.1.el7.x86_64 docker://20.10.12
然后打开浏览器,访问URL: http://{node IP address}:{prometheus-svc-nodeport} ,我这里的地址是 http://192.168.5.21:30221/
正常来说,这里搜索是有数据的,然而根据官网的方法,没有数据。
检查 Status -> Service Discovery ,发现很多数据被drop掉了。这里肯定跟prometheus.yml有关,我们把Prometheus的configmap相关的relabel_configs都删掉,只加一条匹配10254的数据,操作如下。
[root@master-1 grafana]# kubectl get cm -n ingress-nginx
NAME DATA AGE
ingress-controller-leader 0 7h27m
ingress-nginx-controller 1 7h28m
kube-root-ca.crt 1 7h28m
prometheus-configuration-8hk4m6bf76 1 3h13m
[root@master-1 grafana]# kubectl edit cm prometheus-configuration-8hk4m6bf76 -n ingress-nginx
apiVersion: v1
data:
prometheus.yaml: |
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'ingress-nginx-endpoints'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- ingress-nginx
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_port_number]
action: keep
regex: "10254"
kind: ConfigMap
metadata:
creationTimestamp: "2022-01-18T05:56:12Z"
labels:
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: ingress-nginx
name: prometheus-configuration-8hk4m6bf76
namespace: ingress-nginx
删除promethe pod,让配置生效
# kubectl delete pod prometheus-server-779c8d44cf-fsgbd -n ingress-nginx
再通过浏览器访问 http://192.168.5.21:30221/ ,如下图,表示已有数据
接下来,我们开始部署grafana
部署grafana
# kubectl apply --kustomize github.com/kubernetes/ingress-nginx/deploy/grafana/
如果上述命令报错,用下面的方法
[root@master-1 ~]# git clone https://github.com/kubernetes/ingress-nginx.git
[root@master-1 ~]# cd ingress-nginx/deploy/grafana/
[root@master-1 grafana]# kubectl apply --kustomize .
[root@master-1 grafana]# kubectl get svc -n ingress-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
grafana NodePort 10.108.92.58 <none> 3000:31414/TCP 150m
ingress-nginx-controller LoadBalancer 10.96.194.255 <pending> 10254:30406/TCP,80:31361/TCP,443:32098/TCP 7h36m
ingress-nginx-controller-admission ClusterIP 10.103.100.208 <none> 443/TCP 7h36m
prometheus-server NodePort 10.99.70.216 <none> 9090:30221/TCP 3h22m
浏览去访问Grafana dashboard URL地址: http://{node IP address}:{grafana-svc-nodeport} ,我的地址为: http://192.168.5.21:31414/
用户名: admin 密码: admin
添加数据源
点击"Add data source"
选择"Prometheus"
URL填入 http://CLUSTER_IP_PROMETHEUS_SVC:9090 , 我这里是 http://prometheus-server:9090 ,然后点击 Save and Test , 显示绿色表示连接成功。
左侧导航栏Dashboard,点击Import,Import via panel json部分,请粘贴dashboards/nginx.json 这个文件的内容,然后点击 Load ,选择数据源为prometheus [前面我们创建的],保存。可以看到下面的监控画面。
说明: 对于上述的监控数值显示为空,因版本原因,导致监控字段名字改变,需要我们手工改成现有的字段名即可。
参考链接:
https://kubernetes.github.io/ingress-nginx/user-guide/monitoring/
https://kubernetes.github.io/ingress-nginx/deploy/#quick-start