Kubernetes&Monitoring

  • 监控集群中应用

  • 监控集群本身

    • Control-Plane Components(api-server,coredns,kube-scheduler)
    • Kubelet(cAdvisor)-暴露容器metrics
    • Kube-state-metrics-集群层面metrics(deployments,pods metrics)
    • Node-exporter-Host相关metrics(cpu,mem,network)

部署

  • helm是k8s的包管理工具

prometheus-operator
Install helm

$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
$ chmod 700 get_helm.sh
$ ./get_helm.sh

Install Prometheus Chart

kube-prometheus-stack

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Service Monitors

controlplane ~ ➜  kubectl get crd
NAME                                        CREATED AT
addons.k3s.cattle.io                        2024-08-15T00:00:11Z
helmcharts.helm.cattle.io                   2024-08-15T00:00:11Z
helmchartconfigs.helm.cattle.io             2024-08-15T00:00:11Z
traefikservices.traefik.containo.us         2024-08-15T00:01:07Z
ingressroutes.traefik.containo.us           2024-08-15T00:01:07Z
middlewaretcps.traefik.containo.us          2024-08-15T00:01:07Z
ingressrouteudps.traefik.containo.us        2024-08-15T00:01:07Z
serverstransports.traefik.containo.us       2024-08-15T00:01:07Z
tlsoptions.traefik.containo.us              2024-08-15T00:01:07Z
tlsstores.traefik.containo.us               2024-08-15T00:01:07Z
middlewares.traefik.containo.us             2024-08-15T00:01:07Z
ingressroutetcps.traefik.containo.us        2024-08-15T00:01:07Z
alertmanagerconfigs.monitoring.coreos.com   2024-08-15T09:10:15Z
alertmanagers.monitoring.coreos.com         2024-08-15T09:10:16Z
podmonitors.monitoring.coreos.com           2024-08-15T09:10:16Z
probes.monitoring.coreos.com                2024-08-15T09:10:16Z
prometheuses.monitoring.coreos.com          2024-08-15T09:10:16Z #创建prometheuses instance
prometheusrules.monitoring.coreos.com       2024-08-15T09:10:17Z
servicemonitors.monitoring.coreos.com       2024-08-15T09:10:17Z #添加targets用来让prometheus抓取
thanosrulers.monitoring.coreos.com          2024-08-15T09:10:17Z

  • Service monitors定义prometheus用于监控和抓取的targets集合
    Pod
# 创建一个持久化的rocky pod
kubectl run my-host --image=rockylinux/rockylinux --command -- /bin/bash -c "while true; do sleep 3600; done"
pod/my-host created
# 进入container
controlplane ~ ➜  kubectl exec my-host -it -- /bin/bash
[root@my-host /]# yum install wget -y
[root@my-host /]# wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
[root@my-host /]# tar -xzf node_exporter-1.8.2.linux-amd64.tar.gz 
[root@my-host /]# cd node_exporter-1.8.2.linux-amd64

 nohup ./node_exporter & 
# 后台启动nohup ./node_exporter

# 测试
controlplane ~ ➜  kubectl get pod my-host -o wide
NAME      READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
my-host   1/1     Running   0          13m   10.42.0.12   controlplane   <none>           <none>
# 尝试访问服务
controlplane ~ ✖ curl 10.42.0.12:9100
<html lang="en">
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Node Exporter</title>
    <style>body {
  font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,Liberation Sans,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol,Noto Color Emoji;
  margin: 0;
}
header {
  background-color: #e6522c;
  color: #fff;
  font-size: 1rem;
  padding: 1rem;
}
main {
  padding: 1rem;
}
label {
  display: inline-block;
  width: 0.5em;
}

</style>
  </head>
  <body>
    <header>
      <h1>Node Exporter</h1>
    </header>
    <main>
      <h2>Prometheus Node Exporter</h2>
      <div>Version: (version=1.8.2, branch=HEAD, revision=f1e0e8360aa60b6cb5e5cc1560bed348fc2c1895)</div>
      <div>
        <ul>
          
          <li><a href="/metrics">Metrics</a></li>
          
        </ul>
      </div>
      
      
    </main>
  </body>
</html>
controlplane ~ ➜  curl 10.42.0.12:9100/metrics 
···
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19
# HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
# TYPE promhttp_metric_handler_errors_total counter
promhttp_metric_handler_errors_total{cause="encoding"} 0
promhttp_metric_handler_errors_total{cause="gathering"} 0
···

创建Service

# 查看Prometheus instance配置
controlplane ~ ➜  kubectl get prometheuses.monitoring.coreos.com -o yaml 
apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
  kind: Prometheus
  metadata:
    annotations:
      meta.helm.sh/release-name: prometheus-stack
      meta.helm.sh/release-namespace: default
    creationTimestamp: "2024-08-16T02:52:40Z"
    generation: 1
    labels:
      app: kube-prometheus-stack-prometheus
      app.kubernetes.io/instance: prometheus-stack
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/part-of: kube-prometheus-stack
      app.kubernetes.io/version: 45.6.0
      chart: kube-prometheus-stack-45.6.0
      heritage: Helm
      release: prometheus-stack
    name: prometheus-stack-kube-prom-prometheus
    namespace: default
    resourceVersion: "4393"
    uid: b1e782a0-c7d6-4f05-b5b3-20502623a9df
  spec:
    alerting:
      alertmanagers:
      - apiVersion: v2
        name: prometheus-stack-kube-prom-alertmanager
        namespace: default
        pathPrefix: /
        port: http-web
    enableAdminAPI: false
    evaluationInterval: 30s
    externalUrl: http://prometheus-stack-kube-prom-prometheus.default:9090
    hostNetwork: false
    image: quay.io/prometheus/prometheus:v2.42.0
    listenLocal: false
    logFormat: logfmt
    logLevel: info
    paused: false
    podMonitorNamespaceSelector: {}
    podMonitorSelector:
      matchLabels:
        release: prometheus-stack
    portName: http-web
    probeNamespaceSelector: {}
    probeSelector:
      matchLabels:
        release: prometheus-stack
    replicas: 1
    retention: 10d
    routePrefix: /
    ruleNamespaceSelector: {}
    ruleSelector:
      matchLabels:
        release: prometheus-stack # 创建的rule包含这个标签,才能被prometheus instance发现
    scrapeInterval: 30s
    securityContext:
      fsGroup: 2000
      runAsGroup: 2000
      runAsNonRoot: true
      runAsUser: 1000
    serviceAccountName: prometheus-stack-kube-prom-prometheus
    serviceMonitorNamespaceSelector: {}
    serviceMonitorSelector:
      matchLabels:
        release: prometheus-stack# 创建的serviceMonitor也要包含这个标签,才能被prometheus instance发现
    shards: 1
    version: v2.42.0
    walCompression: true
  status:
    availableReplicas: 1
    conditions:
    - lastTransitionTime: "2024-08-16T02:53:21Z"
      observedGeneration: 1
      status: "True"
      type: Available
    - lastTransitionTime: "2024-08-16T02:52:50Z"
      observedGeneration: 1
      status: "True"
      type: Reconciled
    paused: false
    replicas: 1
    shardStatuses:
    - availableReplicas: 1
      replicas: 1
      shardID: "0"
      unavailableReplicas: 0
      updatedReplicas: 1
    unavailableReplicas: 0
    updatedReplicas: 1
kind: List
metadata:
  resourceVersion: ""
# 获取pod标签
controlplane ~ ➜  kubectl get pod my-host --show-labels NAME      READY   STATUS    RESTARTS   AGE   LABELS
my-host   1/1     Running   0          16m   run=my-host
#### svc.yml
apiVersion: v1
kind: Service
metadata:
  name: my-host-exporter-svc
  labels:
    job: my-host-exporter
    app: my-host-exporter-svc
spec:
  selector:
    run: my-host
  ports:
    - name: exporter
      protocol: TCP
      port: 9100
      targetPort: 9100
### 测试
controlplane ~ ➜  kubectl get svc my-host-exporter-svc 
NAME                   TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
my-host-exporter-svc   ClusterIP   10.43.32.68   <none>        9100/TCP   9s
controlplane ~ ✖ curl 10.43.32.68:9100
<html lang="en">
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Node Exporter</title>
    <style>body {
  font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Roboto,Helvetica Neue,Arial,Noto Sans,Liberation Sans,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol,Noto Color Emoji;
  margin: 0;
}
header {
  background-color: #e6522c;
  color: #fff;
  font-size: 1rem;
  padding: 1rem;
}
main {
  padding: 1rem;
}
label {
  display: inline-block;
  width: 0.5em;
}

</style>
  </head>
  <body>
    <header>
      <h1>Node Exporter</h1>
    </header>
    <main>
      <h2>Prometheus Node Exporter</h2>
      <div>Version: (version=1.8.2, branch=HEAD, revision=f1e0e8360aa60b6cb5e5cc1560bed348fc2c1895)</div>
      <div>
        <ul>
          
          <li><a href="/metrics">Metrics</a></li>
          
        </ul>
      </div>
      
      
    </main>
  </body>
</html>      

ServiceMonitor Template

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-host-svc-mon
  labels:
    release: prometheus-stack  
spec:
  jobLabel: job
  endpoints:
  - port: exporter # 同svc
    interval: 30s
    path: /metrics # metrics路径
  selector:
    matchLabels:
      app: my-host-exporter-svc

创建之后发现我们定义的ServiceMonitor已经被prometheus instance发现并可以抓取metrics
在这里插入图片描述
我们也可以执行查询,对my-host Pod的一些指标进行观察
在这里插入图片描述

Rules

为了添加规则,Operator拥有一个prometheusrule
的CRD,用来向prometheus instance注册新规则

Template examples

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    release: prometheus-stack  # 保证可以被prometheus instance找到并注册
  name: my-host-rules
spec:
  groups:
    - name: api
      rules:
      - alert: InstanceDown
        expr: up == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{$labels.instance}} down"

在这里插入图片描述

  • 5
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值