helm默认安装的prometheus,我的版本2.35.0 ,prometheus监控node-export 磁盘信息没有正确区分系统盘和数据盘,如图
数据源配置如图:
其中变量设置如下图
在使用 Prometheus 监控 Node Exporter 的磁盘信息时,如果无法正确区分系统盘和数据盘,可能是因为磁盘的挂载点(Mount Point)没有正确配置或命名导致的。
默认情况下,Node Exporter 会监控所有挂载点的磁盘信息,并将其暴露给 Prometheus。为了正确区分系统盘和数据盘,需要通过配置文件或启动参数来告诉 Node Exporter 哪些挂载点是系统盘,哪些挂载点是数据盘。
例如,在 Node Exporter 的启动参数中,可以使用 -collector.filesystem.ignored-mount-points 参数来忽略某些挂载点,例如,忽略系统盘的挂载点:
node_exporter --collector.filesystem.ignored-mount-points "^/(sys|proc|dev|run|etc|var/lib/docker/containers|var/lib/kubelet|var/lib/rancher/)"
node_exporter v1.3.1版本此参数为: --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
此外,也可以使用 -collector.filesystem.mount-points (此参数v1.3.1 已经不存在)参数来只监控特定的挂载点,例如,只监控数据盘的挂载点:
node_exporter --collector.filesystem.mount-points "/data", "/mnt"
以上示例中,/data 和 /mnt 是数据盘的挂载点,只有这些挂载点的磁盘信息会被监控和暴露给 Prometheus。
需要注意的是,挂载点的命名和配置可能因操作系统和存储设备的不同而有所差异。因此,在配置挂载点时,建议根据实际情况进行调整和修改。
除过以上参数,如果要显示宿主机磁盘使用率还需要在node_exporter挂载 / 目录到容器/host/root下,并且指定启动参数’–path.rootfs=/host/root’ (node_exporter v1.3.1版本适用)
下面是正确配置的示例完整的node_export yaml文件参考
kind: Pod
apiVersion: v1
metadata:
name: prometheus-node-exporter-zhrc7
generateName: prometheus-node-exporter-
namespace: devops
labels:
app: prometheus
chart: prometheus-10.4.0
component: node-exporter
controller-revision-hash: 59b8ccf974
heritage: Helm
pod-template-generation: '1'
release: prometheus
spec:
volumes:
- name: proc
hostPath:
path: /proc
type: ''
- name: sys
hostPath:
path: /sys
type: ''
- name: root
hostPath:
path: /
type: ''
- name: kube-api-access-86x56
projected:
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
name: kube-root-ca.crt
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
defaultMode: 420
containers:
- name: prometheus-node-exporter
image: 'prom/node-exporter:v1.3.1'
args:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/host/root'
- >-
--collector.filesystem.fs-types-exclude=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
- >-
--collector.filesystem.mount-points-exclude=^/(dev|proc|sys|Siact/docker/.+|var/lib/kubelet/pods/.+)($|/)
- '--collector.netclass.ignored-devices=^(veth.*|[a-f0-9]{15})$'
- '--collector.netdev.device-exclude=^(veth.*|[a-f0-9]{15})$'
ports:
- name: metrics
hostPort: 9100
containerPort: 9100
protocol: TCP
resources:
limits:
cpu: 300m
memory: 250Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: proc
readOnly: true
mountPath: /host/proc
- name: sys
readOnly: true
mountPath: /host/sys
- name: root
mountPath: /host/root
- name: kube-api-access-86x56
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
serviceAccountName: prometheus-node-exporter
serviceAccount: prometheus-node-exporter
nodeName: kubernetesm02
hostNetwork: true
hostPID: true
securityContext: {}
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- kubernetesm02
schedulerName: default-scheduler
tolerations:
- operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
- key: node.kubernetes.io/disk-pressure
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/memory-pressure
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/pid-pressure
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/unschedulable
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/network-unavailable
operator: Exists
effect: NoSchedule
priority: 0
enableServiceLinks: true
preemptionPolicy: PreemptLowerPriority
修复后的各分区可用空间监控如下: