k8s,prometheus grafana默认仪表盘配置 No data 。


 

在Prometheus界面上通过PromQL查询,发现指标数据缺失container、image、name、namespace、pod等标签,如下:

查看cadvisor的原始数据,进一步验证了container、image、name、namespace、pod等标签的缺失,如下:
curl -k -H "Authorization: Bearer $TOKEN" https://10.6.128.7:10250/metrics/cadvisor

container_cpu_load_average_10s{container="",id="/",image="",name="",namespace="",pod=""} 0 1666834382282
container_cpu_load_average_10s{container="",id="/docker/5678922ca0bd7afc30b75ffa4ae5fb96298170c3f58a47ae335940b20cd6fa7b",image="",name="",namespace="",pod=""} 0 1666834372644
container_cpu_load_average_10s{container="",id="/kubepods",image="",name="",namespace="",pod=""} 0 1666834372281
container_cpu_load_average_10s{container="",id="/kubepods/besteffort",image="",name="",namespace="",pod=""} 0 1666834378893
container_cpu_load_average_10s{container="",id="/kubepods/besteffort/pod25a7ff7b-7058-4015-8f35-62b2b2a07035",i

有以下几个Issue:
node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate · Issue #1056 · prometheus-operator/kube-prometheus · GitHub

[BUG, RKE1, Monitoring V2] RKE1 1.24 seems to be omitting relevant cadvisor container labels and metric series that break Monitoring V2 dashboards · Issue #38934 · rancher/rancher · GitHubGitHub - fe-ax/cadvisor-k8s-fix: When using Rancher monitoring with Kubernetes 1.24 cAdvisor doesn't work properly due to the dockershim removal

整理了一下解决方法:

1,

删除掉原有的kubelet 的 servicemonitor文件,或者在集群中删掉这个kubelet的servicemonitor,

2,

添加prometheus rule资源:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app.kubernetes.io/name: kube-prometheus
    app.kubernetes.io/part-of: kube-prometheus
    prometheus: k8s
  name: kubernetes-additional-rules
  namespace: monitoring
spec:
  groups:
  - name: kube_pod_container_resource_usage
    interval: 30s  # 规则评估间隔
    rules:
    - record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
      expr: sum by (namespace, pod, container) (irate(container_cpu_usage_seconds_total{job="cadvisor"}[5m]))

  - name: k8s.rules
    rules:
    - record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
      expr: |
        sum by (cluster, namespace, pod, container) (
          irate(container_cpu_usage_seconds_total{job="cadvisor", metrics_path="/metrics", image!=""}[5m])
        )
    - record: node_namespace_pod_container:container_memory_working_set_bytes
      expr: |
        container_memory_working_set_bytes{job="cadvisor", metrics_path="/metrics", image!=""}
    - record: node_namespace_pod_container:container_memory_rss
      expr: |
        container_memory_rss{job="cadvisor", metrics_path="/metrics", image!=""}
    - record: node_namespace_pod_container:container_memory_cache
      expr: |
        container_memory_cache{job="cadvisor", metrics_path="/metrics", image!=""}
    - record: node_namespace_pod_container:container_memory_swap
      expr: |
        container_memory_swap{job="cadvisor", metrics_path="/metrics", image!=""}
    - record: cluster:namespace:pod_memory:active:kube_pod_container_resource_requests
      expr: |
        kube_pod_container_resource_requests{resource="memory",job="kube-state-metrics"} 
        * on (namespace, pod, cluster) group_left() max by (namespace, pod) (
          kube_pod_status_phase{phase=~"Pending|Running"} == 1
        )
    - record: namespace_memory:kube_pod_container_resource_requests:sum
      expr: |
        sum by (namespace, cluster) (
          kube_pod_container_resource_requests{resource="memory",job="kube-state-metrics"} 
          * on(namespace, pod, cluster) group_left() max by (namespace, pod) (
            kube_pod_status_phase{phase=~"Pending|Running"} == 1
          )
        )
      labels:
        workload_type: deployment  # Adjust this label as needed per workload type

3,

在集群中部署独立的cadvisor 

apiVersion: v1
kind: Namespace
metadata:
  name: cadvisor
  labels:
    # 设置 Pod 安全级别为 'privileged',因为 cadvisor 需要较高权限
    pod-security.kubernetes.io/enforce: privileged
---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: cadvisor
  name: cadvisor
  namespace: monitoring
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: cadvisor
  name: cadvisor
  namespace: monitoring
spec:
  clusterIP: None
  ports:
  - name: http
    port: 8080
    protocol: TCP
    targetPort: http
  selector:
    app: cadvisor
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    seccomp.security.alpha.kubernetes.io/pod: docker/default
  labels:
    app: cadvisor
  name: cadvisor
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: cadvisor
      name: cadvisor
  template:
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      labels:
        app: cadvisor
        name: cadvisor
    spec:
      automountServiceAccountToken: false
      containers:
      - args:
        - --housekeeping_interval=10s
        - --max_housekeeping_interval=15s
        - --event_storage_event_limit=default=0
        - --event_storage_age_limit=default=0
        - --enable_metrics=app,cpu,disk,diskIO,memory,network,process
        - --docker_only
        - --store_container_labels=false
        - --whitelisted_container_labels=io.kubernetes.container.name,io.kubernetes.pod.name,io.kubernetes.pod.namespace
        image: zcube/cadvisor:v0.45.0
        name: cadvisor
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        resources:
          limits:
            cpu: 1
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 256Mi
        securityContext:
          privileged: true
        volumeMounts:
        - mountPath: /dev
          name: dev
        - mountPath: /rootfs
          name: rootfs
          readOnly: true
        - mountPath: /var/run
          name: var-run
          readOnly: true
        - mountPath: /sys
          name: sys
          readOnly: true
        - mountPath: /var/lib/docker
          name: docker
          readOnly: true
        - mountPath: /dev/disk
          name: disk
          readOnly: true
        - mountPath: /run/containerd
          name: containerd
          readOnly: true
        - mountPath: /var/lib/containerd
          name: containerd-var
          readOnly: true
      priorityClassName: system-node-critical
      serviceAccountName: cadvisor
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoSchedule
        operator: Exists
      - effect: NoExecute
        operator: Exists
      volumes:
      - hostPath:
          path: /dev
        name: dev
      - hostPath:
          path: /
        name: rootfs
      - hostPath:
          path: /var/run
        name: var-run
      - hostPath:
          path: /sys
        name: sys
      - hostPath:
          path: /var/lib/docker
        name: docker
      - hostPath:
          path: /dev/disk
        name: disk
      - hostPath:
          path: /var/lib/containerd
          type: ""
        name: containerd-var
      - hostPath:
          path: /run/containerd
          type: ""
        name: containerd
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: cadvisor
  name: cadvisor
  namespace: monitoring
spec:
  endpoints:
  - honorLabels: true
    metricRelabelings:
    - sourceLabels:
      - container_label_io_kubernetes_pod_name
      targetLabel: pod
    - sourceLabels:
      - container_label_io_kubernetes_pod_namespace
      targetLabel: namespace
    - sourceLabels:
      - container_label_io_kubernetes_container_name
      targetLabel: container
    - replacement: ""
      targetLabel: cluster
    path: /metrics
    port: http
    relabelings:
    - sourceLabels:
      - __metrics_path__
      targetLabel: metrics_path
  namespaceSelector:
    matchNames:
    - monitoring
  selector:
    matchLabels:
      app: cadvisor

然后等待prometheus加载配置后查看面板
done!

  • 4
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值