项目场景
在跟博哥的k8s课程
第28关 k8s监控实战之Prometheus(五)
使用prometheus监控ingress-nginx服务
资源配置
我的ingress-controller是部署在一个单独的命名空间test-ingress-controller里,
创建的ServiceMonitor配置如下
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: ingress-nginx
name: nginx-ingress-scraping
namespace: test-ingress-controller
spec:
endpoints:
- interval: 30s
path: /metrics
port: metrics
jobLabel: app
namespaceSelector:
matchNames:
- test-ingress-controller
selector:
matchLabels:
app: ingress-nginx
检查是否能正确选择service
# kubectl -n test-ingress-controller get service -l app=ingress-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx-ingress-lb ClusterIP 10.68.219.193 <none> 80/TCP,443/TCP,10254/TCP 12d
问题描述
在Prometheus 的 targets 中看不到新创建的 ServiceMonitor
serviceMonitor/test-ingress-controller/nginx-ingress-scraping
原因分析
查看日志
# kubectl -n monitoring logs prometheus-k8s-0
ts=2024-06-28T16:42:40.014Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.27.3/tools/cache/reflector.go:231: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:monitoring:prometheus-k8s" cannot list resource "pods" in API group "" in the namespace "test-ingress-controller"`
ts=2024-06-28T16:43:10.772Z caller=klog.go:108 level=warn component=k8s_client_runtime func=Warningf msg="pkg/mod/k8s.io/client-go@v0.27.3/tools/cache/reflector.go:231: failed to list *v1.Endpoints: endpoints is forbidden: User "system:serviceaccount:monitoring:prometheus-k8s" cannot list resource "endpoints" in API group "" in the namespace "test-ingress-controller"`
提示prometheus-k8s 服务账户在尝试访问 test-ingress-controller
命名空间下的 pods
和 endpoints
资源时遇到了权限问题。
默认只配置了get权限
在博哥的第22关有个类似的错误
第22关 深入解析K8s中的RBAC角色访问控制策略
# kubectl --kubeconfig ./kube_config/test-kubeconfig-a.kube.conf get all
Error from server (Forbidden): horizontalpodautoscalers.autoscaling is forbidden: User "system:serviceaccount:test-kubeconfig-a:test-kubeconfig-a-user" cannot list resource "horizontalpodautoscalers" in API group "autoscaling" in the namespace "test-kubeconfig-a"
提示hpa自动伸缩的pod看不了
服务账户test-kubeconfig-a-user
当前未获授权以列举(list)归属于autoscaling
API组下的horizontalpodautoscalers
资源类型,在test-kubeconfig-a
命名空间范围内
解决方案
方案一 Role
在命名空间test-ingress-controller里添加一个角色Role和角色绑定RoleBinding。把拥有足够权限的Role和命名空间monitor的ServiceAccount账号prometheus-k8s绑定
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: test-ingress-controller-prometheus-k8s
namespace: test-ingress-controller # 指定ingress-controller的命名空间
rules:
- apiGroups: [""] # "" 表示核心 API 组
resources: ["services", "endpoints", "pods"]
verbs: ["get", "list", "watch"]
- apiGroups: ["extensions", "networking.k8s.io"]
resources: ["ingresses"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: test-ingress-controller-prometheus-k8s-binding
namespace: test-ingress-controller # 与 Role 的命名空间相同
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring # prometheus服务账户所在命名空间
roleRef:
kind: Role
name: test-ingress-controller-prometheus-k8s
apiGroup: rbac.authorization.k8s.io
重启prometheus pod
# kubectl -n monitoring delete pod prometheus-k8s-0 && kubectl -n monitoring delete pod prometheus-k8s-1
等一会儿就能看到
serviceMonitor/test-ingress-controller/nginx-ingress-scraping/0 (2/2 up)
方案二 ClusterRole
默认的
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
修改
# kubectl -n monitoring edit clusterrole prometheus-k8s
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
这个方法没有尝试过
参考
Prometheus-Operator使用ServiceMonitor监控配置时遇坑与解决总结
排查 Kubernetes HPA 通过 Prometheus 获取不到 http_requests 指标的问题
为什么配置的ServiceMonitor或PodMonitor未生效?