25.prometheus监控k8s集群节点

25.prometheus监控k8s集群

一、node-exporter

node_exporter抓取用于采集服务器节点的各种运行指标,比如 conntrack,cpu,diskstats,filesystem,loadavg,meminfo,netstat等
更多查看:https://github.com/prometheus/node_exporter

1. Daemon Set部署node-exporter

拉取镜像docker pull prom/node-exporter:v1.1.2
vi node-exporter-dm.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: kube-mon
  labels:
    name: node-exporter
spec:
  selector:
    matchLabels:
      name: node-exporter
  template:
    metadata:
      labels:
        name: node-exporter
    spec:
      hostPID: true    # 使用主机PID namespace
      hostIPC: true    # 使用主机IPC namespace
      hostNetwork: true    # 使用主机net namespace
      containers:
      - name: node-exporter
        image: harbor.hzwod.com/k8s/prom/node-exporter:v1.1.2
        ports:
        - containerPort: 9100
        resources:
          requests:
            cpu: 150m
#        securityContext:
#          privileged: true
        args:
        - --path.rootfs
        - /host
        volumeMounts:
        - name: rootfs
          mountPath: /host
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
      volumes:
        - name: rootfs
          hostPath:
            path: /
  • hostPID: truehostIPC: truehostNetwork: true使node-export容器和主机共享PID、IPC、NET命名空间,以能使用主机的命令等资源
  • 注意,因和主机共享了net namespace ,则containerPort: 9100会直接暴露到主机的9001端口,该端口将作为metrics的服务入口
  • 挂载主机的/目录到容器/host目录,指定参数--path.rootfs=/host,使容器能找到并通过主机的这些文件获取主机的信息,如/proc/stat能获取cpu信息、/proc/meminfo能获取内存信息
  • tolerations 为pod添加容忍,允许该pod能运行在master节点上,因为我们希望master节点也能被监控,若有其他污点node再同理处理

kubectl apply -f node-exporter-dm.yaml 异常
在这里插入图片描述
查看 kube-apiserver -h找到这条说明
在这里插入图片描述
给kube-apiserver添加该启动参数--allow-privileged=true允许容器请求特权模式
或去掉上面的securityContext.privileged: true这个配置(TODO有什么影响暂时还不知)

检查metrics
curl http://172.10.10.100:9100/metrics
我们能看到能多指标信息

此时每个节点都有一个metrics接口,我们可以在prometheus上为每个node都配置上监控,但是若我们增加了一个node是不是就需要修改一次prometheus配置,有没有简单的方式能自动发现node呢?接下来看一看prometheus的服务发现

2. 服务发现

在 Kubernetes 下,Promethues 通过与 Kubernetes API 集成,目前主要支持5中服务发现模式,分别是:Node、Service、Pod、Endpoints、Ingress。

a. node发现

添加prometheus config

- job_name: 'kubernetes-nodes'
  kubernetes_sd_configs:
  - role: node
  • kubernetes_sd_configs是prometheus提供的Kubernetes API服务发现配置
  • role可以是node、service、pod、endpoints、ingress,不同的role支持不同的meta labels
    更多信息可以查看官方文档:kubernetes_sd_config

除了kubernetes_sd_config prometheus还有还有很多其他选项prometheus configuration

reload prometheus后查看targets,发现自动发现生效了,但是接口都400了
在这里插入图片描述

b. 使用relabel_config调整服务发现的Endpoint

我们发现自动发现node后,prometheus自动寻找的端口是10250,而且还不通,这是为什么呢
10250端口实际上是旧版本kubelet提供的只读数据统一接口,现在版本的kubelet(此文版本:v1.17.16)已经修改为10255
而我们希望此处自动发现node的监听端口是我们node-export提供的9100端口(即使要使用kubelet自带的metrics也要修改成10255端口,下文配置cAdvisor时会用到)

kubelet启动后自动开启10255端口,可以通过curl http://[nodeIP]:10255/metrics查看监控信息

我们也可以通过relabel_configs来介入修改此处的Endpoint的端口或其他信息
修改prometheus.yaml 的kubernetes-nodes job配置

- job_name: 'kubernetes-nodes'
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - action: replace    # 替换动作
    source_labels: [__address__]    # 数组,指定多个label串联被regex匹配
    target_label: __address__    # 替换的目标label
    regex: '(.*):10250'    # 正则匹配source_labels指定的labels串联值
    replacement: '${1}:9100'    # 为目标label替换后的值
  • action: replace 动作为替换
  • __address__
  • replacement: '${1}:9100' ${1}为引用regex正则表达式的第一个匹配组
    更多信息查看relabel_configs

官网关于__address__的一段描述
The __address__ label is set to the <host>:<port> address of the target. After relabeling, the instance label is set to the value of __address__ by default if it was not set during relabeling. The __scheme__ and __metrics_path__ labels are set to the scheme and metrics path of the target respectively. The __param_<name> label is set to the value of the first passed URL parameter called <name>

再添加 labelmap 添加kubernetes node的label作为prometheus的Labels,便于后续监控数据的筛选

  - action: labelmap
    regex: __meta_kubernetes_node_label_(.*)

更新prometheus.yaml并reload后,查看prometheus
在这里插入图片描述

c. 完整的prometheus.yaml

我们看一下完整的prometheus configmap(prometheus.yam使用configmap方式储存在etcd中)
prometheus-cm.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: kube-mon
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      scrape_timeout: 15s
    scrape_configs:
    - job_name: 'prometheus'
      static_configs:
      - targets: ['localhost:9090']
    - job_name: 'coredns'
      static_configs:
      - targets: ['kube-dns.kube-system:9153']
    - job_name: 'traefik'
      static_configs:
        - targets: ['traefiktcp.default:8180']
    - job_name: 'kubernetes-nodes'
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: replace    # 替换动作
        source_labels: [__address__]    # 数组,指定多个label串联被regex匹配
        target_label: __address__    # 替换的目标label
        regex: '(.*):10250'    # 正则匹配source_labels指定的labels串联值
        replacement: '${1}:9100'    # 为目标label替换后的值
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.*)
3. 配置grafana展示节点监控信息

前面我们已经安装好grafana且配置好了prometheus数据源,我们现在配置grafana模板监控展示nodeexport信息
下载模板:https://grafana.com/api/dashboards/8919/revisions/24/download
在这里插入图片描述

二、kube-state-metrics + cAdvisor
1. 配置prometheus监控cAdvisor

cAdvisor作为kubelet内置的一部分程序可以直接使用

    - job_name: 'k8s-cadvisor'
      metrics_path: /metrics/cadvisor
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:10255'
        target_label: __address__
        action: replace
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      metric_relabel_configs:
      - source_labels: [instance]
        separator: ;
        regex: (.+)
        target_label: node
        replacement: $1
        action: replace
      - source_labels: [pod_name]
        separator: ;
        regex: (.+)
        target_label: pod
        replacement: $1
        action: replace
      - source_labels: [container_name]
        separator: ;
        regex: (.+)
        target_label: container
        replacement: $1
        action: replace

在这里插入图片描述

2. 部署kube-state-metrics

https://github.com/kubernetes/kube-state-metrics/tree/master/examples/standard

本节部署kube-state-metrics的namespace:kube-mon
kube-state-metrics版本为v1.9.8

  • 下载镜像
    docker pull quay.mirrors.ustc.edu.cn/coreos/kube-state-metrics:v1.9.8
  • cluster-role-binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 1.9.8
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-mon
---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 1.9.8
  name: kube-state-metrics
  namespace: kube-mon
  • cluster-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 1.9.8
  name: kube-state-metrics
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs:
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - statefulsets
  - daemonsets
  - deployments
  - replicasets
  verbs:
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - cronjobs
  - jobs
  verbs:
  - list
  - watch
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - list
  - watch
- apiGroups:
  - authentication.k8s.io
  resources:
  - tokenreviews
  verbs:
  - create
- apiGroups:
  - authorization.k8s.io
  resources:
  - subjectaccessreviews
  verbs:
  - create
- apiGroups:
  - policy
  resources:
  - poddisruptionbudgets
  verbs:
  - list
  - watch
- apiGroups:
  - certificates.k8s.io
  resources:
  - certificatesigningrequests
  verbs:
  - list
  - watch
- apiGroups:
  - storage.k8s.io
  resources:
  - storageclasses
  - volumeattachments
  verbs:
  - list
  - watch
- apiGroups:
  - admissionregistration.k8s.io
  resources:
  - mutatingwebhookconfigurations
  - validatingwebhookconfigurations
  verbs:
  - list
  - watch
- apiGroups:
  - networking.k8s.io
  resources:
  - networkpolicies
  - ingresses
  verbs:
  - list
  - watch
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - list
  - watch
  • deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 1.9.8
  name: kube-state-metrics
  namespace: kube-mon
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-state-metrics
  template:
    metadata:
      labels:
        app.kubernetes.io/name: kube-state-metrics
        app.kubernetes.io/version: 1.9.8
    spec:
      containers:
      - image: harbor.hzwod.com/k8s/kube-state-metrics:v1.9.8
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5
        name: kube-state-metrics
        ports:
        - containerPort: 8080
          name: http-metrics
        - containerPort: 8081
          name: telemetry
        readinessProbe:
          httpGet:
            path: /
            port: 8081
          initialDelaySeconds: 5
          timeoutSeconds: 5
        securityContext:
          runAsUser: 65534
      nodeSelector:
        kubernetes.io/os: linux
      serviceAccountName: kube-state-metrics
  • service.yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scraped: "true"
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 1.9.8
  name: kube-state-metrics
  namespace: kube-mon
spec:
  clusterIP: None
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
  - name: telemetry
    port: 8081
    targetPort: telemetry
  selector:
    app.kubernetes.io/name: kube-state-metrics

kubectl apply -f . 应用这些资源启动kube-state-metrics容器及服务

3. 配置prometheus获取kube-state-metrics监控信息

prometheus.yaml 添加入如下job

    - job_name: kube-state-metrics
      kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names:
          - kube-mon
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
        regex: kube-state-metrics
        replacement: $1
        action: keep
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: k8s_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: k8s_sname
  • endpoints自动发现service
  • keep 只监控label为app.kubernetes.io/name: kube-state-metrics的service

修改配置,reload prometheus后查看
在这里插入图片描述

4. 配置grafana模板展示监控信息

该模板需cadvisor和kube-state-metrics两提供的信息,因此上文完成了prometheus对这两个metrics的信息获取

  • 下载模板
    https://grafana.com/grafana/dashboards/13105

  • 效果
    在这里插入图片描述

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

hzw@sirius

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值