节点监控指标
获取节点数:
sum(kube_node_info)
不可用的节点:
sum(kube_node_spec_unschedulable)
获取节点cpu核数:
sum(kube_node_status_capacity{resource="cpu"})by(node)
获取节点内存大小:
sum(kube_node_status_capacity{resource="memory"})by(node)
磁盘资源短缺的节点:
kube_node_status_condition{condition="DiskPressure",status="true"}
内存资源短缺的节点:
kube_node_status_condition{condition="MemoryPressure",status="true"}
PID 资源短缺的节点:
kube_node_status_condition{condition="PIDPressure",status="true"}
Deployment 监控指标
获取各个deployment的副本数:
kube_deployment_status_replicas
获取总的replicas:
sum(kube_deployment_status_replicas)
更新了的replicas
kube_deployment_status_replicas_updated
不可用的replicas
kube_deployment_status_replicas_unavailable
Pods监控指标
pod 的状态
kube_pod_status_phase{phase="Running"}
kube_pod_status_phase{phase="Failed"}
kube_pod_status_phase{phase="Succeeded"}
kube_pod_status_phase{phase="Pending"}
kube_pod_status_phase{phase="Unknown"}
30分钟内重启过的pod
changes(kube_pod_container_status_restarts_total[30m])
容器监控指标
容器的状态
kube_pod_container_status_running
kube_pod_container_status_waiting
kube_pod_container_status_terminated
30分钟内重启过的容器
changes(kube_pod_container_status_restarts_total[30m])
请求cpu核数
kube_pod_container_resource_requests{resource="cpu"}
请求内存大小
kube_pod_container_resource_requests{resource="memory"}
PV/PVC 监控指标
pvc状态
kube_persistentvolumeclaim_status_phase{phase="Bound"}
kube_persistentvolumeclaim_status_phase{phase="Pending"}
kube_persistentvolumeclaim_status_phase{phase="Lost"}
pvc请求大小
sum(kube_persistentvolumeclaim_resource_requests_storage_bytes/1024/1024/1024)by(namespace,persistentvolumeclaim)
pv 大小
sum(kube_persistentvolume_capacity_bytes)by(persistentvolume)
volume利用率(已用/容量)
kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes
volume可用
sum(kubelet_volume_stats_available_bytes)by(persistentvolumeclaim)