K8S学习笔记0612

K8S中网络分为overlay(叠加网络 如calico和flannel)和underlay

Cadvisor监控Pod

Cadvisor (容器顾问)让容器用户了解其运行容器的资源使用情况和性能状态,cAdvisor用于收集、聚合、处理和导出有关正在运行的容器的信息,具体来说,对于每个容器都会保存资源隔离参数、历史资源使用情况、完整历史资源使用情况的直方图和网络统计信息,此数据按容器和机器范围导出。

指标名称类型含义
container_cpu_load_average_10sgauge过去10秒容器cpu的平均负载
container_cpu_usage_seconds_totalcounter容器在每个CPu内核上的累积 占用时间(单位:秒)
container_cpu_system_seconds_totalcounterSystem CPu累积占用时间 (单位:秒)
container_cpu_user_seconds_totalcounterUser CPu累积占用时间  (单位:字节)
container_fs_usage_bytesgauge容器中文件系统的使用量  (单位:字节)
container_fs_reads_bytes_totalcounter容器累积读取数据的总量 (单位:字节)
container_fs_writes_bytes_totalcounter容器累积写入数据的总量  (单位:字节)
container_memory_max_usage_bytesgauge容器的最大内存使用量 (单位:字节)
container_memory_usage_bytesgauge容器当前的内存使用量 (单位:字节)
container_spec_memory_limit_bytesgauge容器的内存使用量限制
machine_memory_bytesgauge当前主机的内存总量
container_network_receive_bytes_totalcounter容器网络累积接收数据总量
container_network_transmit_bytes_totalcounter容器网络累积传输数据总量

当能够正常采集到cAdvisor的样本数据后,可以通过以下表达式计算容器的cPu使用率

sum(irate(container_opu_usage_seconds_total{image!=""} [ 1m] )) without (cpu)

查询容器内存使用量(单位:字节) 

container_memory_usage_bytes{image!=""}

 查询容器网络接收量(速率)(单位:字节/秒)

sum (rate(container_network_receive_bytes_total{image!=""} [ lm] )) without ( interface)

容器网络传输量字节/秒

sum(rate(container_network_transmit_bytes_total{image!=""} [ 1m] )) without (interface)

容器文件系统读取速率 字节/秒

sum(rate(container_fs_reads_bytes_total{image!=""} [ 1m] )) without (device)

容器文件系统写入速率 字节/秒

sum(rate(container_fs_writes_bytes_total{image!=""} [ lm])) without (device)

容器网络接收的字节数(1分钟内),根据名称查询name=~".+"

sum(rate(container_network_receive_bytes_total{name=~”.+"] [ 1m])) by (name)

容器网络传输的字节数(1分钟内),根据名称查询name=~".+" 

sum(rate(container_network_transmit_bytes_total{name=~”.+"}[ 1m])) by (name)

所用容器system cpu的累计使用时间(1min钟内)

sum (rate (container_cpu_system_seconds_total [1m] ))

每个容器system cpu的使用时间( 1min钟内)

sum(irate(container_cpu_system_seconds_total{image!=""} [ 1m] )) without (cpu)

 每个容器的cpu使用率

sum(rate(container_cpu_usage_seconds_total(name=~”.+" "} [ lm])) by (name) * 100

总容器的cpu使用率 

sum(sum(rate(container_cpu_usage_seconds_total{name=~".+" } [ lm])) by (name)*100 )

通过daemonset部署 Cadvisor

[root@k8s-master1 0612]# cat case1-daemonset-deploy-cadvisor.yaml 
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: cadvisor
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: cAdvisor
  template:
    metadata:
      labels:
        app: cAdvisor
    spec:
      tolerations:    #污点容忍,忽略master的NoSchedule
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
      hostNetwork: true
      restartPolicy: Always   # 重启策略
      containers:
      - name: cadvisor
        image: k8s-harbor.com/public/cadvisor:v0.39.3 
        imagePullPolicy: IfNotPresent  # 镜像策略
        ports:
        - containerPort: 8080
        volumeMounts:
          - name: root
            mountPath: /rootfs
          - name: run
            mountPath: /var/run
          - name: sys
            mountPath: /sys
          - name: docker
            mountPath: /var/lib/docker
      volumes:
      - name: root
        hostPath:
          path: /
      - name: run
        hostPath:
          path: /var/run
      - name: sys
        hostPath:
          path: /sys
      - name: docker
        hostPath:
          path: /var/lib/docker

 

 

 

 K8S部署nodeexporter

[root@k8s-master1 0612]# cat case2-daemonset-deploy-node-exporter.yaml 
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring 
  labels:
    k8s-app: node-exporter
spec:
  selector:
    matchLabels:
        k8s-app: node-exporter
  template:
    metadata:
      labels:
        k8s-app: node-exporter
    spec:
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
      containers:
      - image: prom/node-exporter:v1.3.1 
        imagePullPolicy: IfNotPresent
        name: prometheus-node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          protocol: TCP
          name: metrics
        volumeMounts:
        - mountPath: /host/proc
          name: proc
        - mountPath: /host/sys
          name: sys
        - mountPath: /host
          name: rootfs
        args:
        - --path.procfs=/host/proc
        - --path.sysfs=/host/sys
        - --path.rootfs=/host
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /
      hostNetwork: true
      hostPID: true
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
  labels:
    k8s-app: node-exporter
  name: node-exporter
  namespace: monitoring 
spec:
  type: NodePort
  ports:
  - name: http
    port: 9100
    nodePort: 39100
    protocol: TCP
  selector:
    k8s-app: node-exporter

 

 

 K8S中服务发现

yaml部署prometheus

创建prometheus的配置文件

[root@k8s-master1 prometheus]# cat case3-1-prometheus-cfg.yaml
---
kind: ConfigMap
apiVersion: v1
metadata:
  labels:
    app: prometheus
  name: prometheus-config
  namespace: monitoring 
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      scrape_timeout: 10s
      evaluation_interval: 1m
    scrape_configs:
    - job_name: 'kubernetes-node'
      kubernetes_sd_configs:    #服务动态发现
      - role: node #node节点
      relabel_configs:#重写标签配置
      - source_labels: [__address__]
        regex: '(.*):10250'    #node kubelet端口
        replacement: '${1}:9100'    #地址不动把上面的node端口替换为9100,对9100进行监控抓取数据
        target_label: __address__
        action: replace
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubernetes-node-cadvisor-n66'
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:8080'
        target_label: __address__
        action: replace
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubernetes-node-cadvisor'
      kubernetes_sd_configs:
      - role:  node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - job_name: 'kubernetes-apiserver'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name


    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
      - role: pod
        namespaces: #可选指定namepace,如果不指定就是发现所有的namespace中的pod
          names:
          - myserver
          - magedu
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name

 准备工作

[root@k8s-master1 prometheus]# mkdir /data/prometheus
[root@k8s-master1 prometheus]# chmod 777 /data/prometheus
[root@k8s-master1 prometheus]# kubectl create sa monitor -n monitoring    #创建账号
serviceaccount/monitor created
[root@k8s-master1 prometheus]# kubectl create clusterrolebinding monitor-clusterrolebinding -n monitoring --clusterrole=cluster-admin --serviceaccount=monitoring:monitor
clusterrolebinding.rbac.authorization.k8s.io/monitor-clusterrolebinding created    #对账号授权

 

 yaml部署

[root@k8s-master1 prometheus]# cat case3-2-prometheus-deployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-server
  namespace: monitoring
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
      component: server
    #matchExpressions:
    #- {key: app, operator: In, values: [prometheus]}
    #- {key: component, operator: In, values: [server]}
  template:
    metadata:
      labels:
        app: prometheus
        component: server
      annotations:
        prometheus.io/scrape: 'false'
    spec:
      nodeName: 192.168.226.144
      serviceAccountName: monitor
      containers:
      - name: prometheus
        image: prom/prometheus:v2.36.1
        imagePullPolicy: IfNotPresent
        command:
          - prometheus
          - --config.file=/etc/prometheus/prometheus.yml
          - --storage.tsdb.path=/prometheus
          - --storage.tsdb.retention=720h
          - --web.enable-lifecycle
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/prometheus/prometheus.yml
          name: prometheus-config
          subPath: prometheus.yml
        - mountPath: /prometheus/
          name: prometheus-storage-volume
      volumes:
        - name: prometheus-config
          configMap:
            name: prometheus-config
            items:
              - key: prometheus.yml
                path: prometheus.yml
                mode: 0644
        - name: prometheus-storage-volume
          hostPath:
           path: /data/prometheusdata
           type: Directory

[root@k8s-master1 prometheus]# cat case3-3-prometheus-svc.yaml 
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitoring
  labels:
    app: prometheus
spec:
  type: NodePort
  ports:
    - port: 9090
      targetPort: 9090
      nodePort: 39090
      protocol: TCP
  selector:
    app: prometheus
    component: server

prometheus的服务发现机制

prometheus默认是采用pull方式拉取监控数据的,也就是定时去目标主机上抓取 metrics 数据,每一个被抓取的目标需要暴露一个HTTP接口,prometheus通过这个暴露的接口就可以获取到相应的指标数据,这种方式需要由目标服务决定采集的目标有哪些,通过配置在scrape_configs中的各种job来实现,无法动态感知新服务,如果后面增加了节点或者组件信息,就得手动修promrtheus配置,并重启 promethues,很不方便,所以出现了动态服务发现,动态服务发现能够自动发现集群中的新端点,并加入到配置中,通过服务发现,Prometheus能查询到需要监控的Target列表,然后轮询这些Target获取监控数据。

prometheus获取数据源target的方式有多种,如静态配置和动态服务发现配置,prometheus目前支持的服务发现有很多种,常用的主要分为以下几种:

kubernetes_sd_configs:#基于Kubernetes API实现的服务发现,让 prometheus动态发现kubernetes中被监控的目标

static_configs: #静态服务发现,基于prometheus 配置文件指定的监控目标

dns_sd_configs: #DNS 服务发现监控目标

consul_sd_configs: #Consul服务发现,基于consul服务动态发现监控目标

file_sd_configs:#基于指定的文件实现服务发现,基于指定的文件发现监控目标,自动加载,不需要重启

promethues 的静态静态服务发现static_configs:每当有一个新的目标实例需要监控,都需要手动修改配置文件配置目标target.

promethues的consul服务发现consul.d_configs: Prometheus一直监视consul服务,当发现在consul中注册的服务有变化, prometheus 就会自动监控到所有注册到consul中的目标资源。

promethues 的 kBs服务发现kubernetes_sd.configs: Prometheus 与Kubernetes的API进行交互,动态的发现Kubernetes中部箸的所有可监控的目标资源。 

kubernetes_sd_configs 

promethues的relabeling(重新修改标签)功能很强大,它能够在抓取到目标实例之前把目标实例的元数据标签动态重新修改,动态添加或者覆盖标签

prometheus 从 Kubernetes APIl动态发现目标(targer)之后,在被发现的target 实例中,都包含一些原始的Metadata标签信息,默认的标签有:

__address__: 以<host>:<port>格式显示目标 targets 的地址

__scheme__:采集的目标服务地址的Scheme形式,HTTP或者HTTPS

__metrics_path__:采集的目标服务的访问路径

 基础功能-重新标记目的

为了更好的识别监控指标,便于后期调用数据绘图、告警等需求,prometheus 支持对发现的目标进行label修改,在两个阶段可以重新标记:

relabel_configs : 在对target进行数据采集之前(比如在采集数据之前重新定义标签信息,如目的IP、目的端口等信息),可以使用relabel_configs添加、修改或删除一些标签、也可以只采集特定目标或过滤目标。

metric_relabel_configs:在对target进行数据采集之后,即如果是已经抓取到指标数据时,可以使用metric_relabel_configs做最后的重新标记和过滤。

- job_name: 'kubernetes-apiserver'job名称
   kubernetes_sd_configs:#基于kubernetes_sd_configs实现服务发现
   - role: endpoints #发现endpoints
   scheme: https #当前jod使用的发现协议
   tls_config: #证书配置
    ca_file:/var/run/secrets/kubernetes.io/serviceaccount/ca.crt #容器里的证书路径
   bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token#容器里的token路径
   relabel_configs:#重新re修改标签 label配置configs
   - source_labels: [_meta_kubernetes_namespace,_meta_kubernetes_service_name,meta_kubernetes_endpoint_port_name]#源标签,即对哪些标签进行操作
    action: keep #action定义了relabel的具体动作,action支持多种
    regex: default;kubernetes;https #发现了default命名空间的kubernetes服务切是https协议
   

 label详解

source_labels:源标签,没有经过relabel处理之前的标签名字
target_label:通过action处理之后的新的标签名字
regex:给定的值或正则表达式匹配,匹配源标签
replacement:通过分组替换后标签(target_label)对应的值

action 详解

replace:替换标鉴值,椎据regex正则匹彰到源标签的值,使用replacement来引用糠达式匹配的分飙

keep: 满足regex正则条件的实例遗行朱集,把source_labels中没有匹配到regex正则内客的Target实锏去掉,即只乘集匹配咸力的实例。

drop:满足regex正则条件的实例不采集,把 source_labels中匹配到regex 正则内容的Target实例丢掉,即只采集没有匹配到的实例。

hashmod:使用hashmod计算 source_labels 的 Hash值并进行对比,基于自定义的模数取模,以实现对目标进行分类、重新赋值等功能:

scrape_configs:
- job_narme: ip _job
   relabel_configs:
   - source_labels:[ address_]
      modulus: 4
      target_label:_ip_hash
      action:
   - source_labels: [ ip_hash]
      regex: ^1$
       action: keep

labelmap:匹配 regex所有标签名称,然后复制匹配标签的值进行分组,通过replacement分组引用(${1},${2},…)替代

labelkeep:匹配 regex所有标签名称,其它不匹配的标签都将从标签集中删除

labeldrop:匹配 regex所有标签名称,其它匹配的标签都将从标签集中删除

支持的发现目标类型

node
service
pod
endpoint
ingress
Endpointslice #对endpoint进行切片

监控api-server发现

apiserver 作为Kubernetes最核心的组件,它的监控也是非常有必要的,对于apiserver的监控我们可以直接通过kubernetes的service来获取:

    - job_name: 'kubernetes-apiserver'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https

regex: default;kubernetes;https         #含义为namespace为default,svc名称为kubernetes并且协议是https,区配成功后进行保留,并且把regex作为source_labels相对应的值.即labels 为key,regex为值。

label替换方式如下

__meta_kubernetes_namespace=default,__meta_kubernetes_service_name=kubernetes,__metakubernetes_endpoint_port_name=https

 

 最终匹配到apiserver地址

[root@k8s-master1 prometheus]# kubectl get ep
NAME         ENDPOINTS              AGE
kubernetes   192.168.226.144:6443   61d

 api-server指标数据

Apiserver组件是k8s集群的入口,所有请求都是从apiserver进来的,所以对apiserver指标做监控可以用来判断集群的健康状况。

apiserver_request_total

以下promQL语句为查询apiserver最近一分钟不同方法的请求数量统计:apiserver_request_total为请求各个服务的访问详细统计:

 irate 和 rate都会用于计算某个指标在一定时间间隔内的变化速率。但是它们的计算方法有所不同: irate 取的是在指定时间范围内的最近两个数据点来算速率,而rate会取指定时间范围内所有数据点,算出一组速率,然后取平均值作为结果。

所以官网文档说: irate适合快速变化的计数器(counter) ,而rate适合缓慢变化的计数器(counter)。

根据以上算法我们也可以理解,对于快速变化的计数器,如果使用rate,因为使用了平均值,很容易把峰直削平。除非我们把时间间隔设置得足够小,就能够减弱这种效应。

rate(apiserver_request_total{code=~"^(?:2..)$"}[5m])

 

irate(apiserver_request_total{code=~"^(?:2..)$"}[5m])

 关于annotation_prometheus_io_scrape

在k8s中,基于prometheus 的发现规则,需要在被发现的目的target定义注解匹配annotation_prometheus_io_scrape=true,且必须匹配成功该注解才会保留监控target,然后再进行数据抓取并进行标签替换,如 annotation_prometheus_io_scheme标签为http 或https:

    - job_name: 'kubernetes-service-endpoints'    #job名称
      kubernetes_sd_configs:    #sd_configs发现
      - role: endpoints    #角色endpoints发现
      relabel_configs:    #标签重写配置
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true    #值为true时保留
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__    #标签替换为__scheme__
        regex: (https?)    #正则匹配http或者https
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace    
        target_label: __metrics_path__#标签替换为__metrics_path__
        regex: (.+)    #匹配路径为1到任意长度
      - source_labels: [__address__,     __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__    替换标签为__address__
        regex: ([^:]+)(?::\d+)?;(\d+)    #匹配地址:端口
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)    #正则匹配servicename
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace    #替换标签并显示
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name

kube-dns发现

[root@k8s-master1 prometheus]# kubectl describe svc kube-dns -n kube-system
Name:              kube-dns
Namespace:         kube-system
Labels:            addonmanager.kubernetes.io/mode=Reconcile
                   k8s-app=kube-dns
                   kubernetes.io/cluster-service=true
                   kubernetes.io/name=CoreDNS
Annotations:       prometheus.io/port: 9153    #注解标签用于发现端口
                   prometheus.io/scrape: true    #允许抓取数据
Selector:          k8s-app=kube-dns
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.100.0.2
IPs:               10.100.0.2
Port:              dns  53/UDP
TargetPort:        53/UDP
Endpoints:         10.200.36.113:53,10.200.36.122:53
Port:              dns-tcp  53/TCP
TargetPort:        53/TCP
Endpoints:         10.200.36.113:53,10.200.36.122:53
Port:              metrics  9153/TCP
TargetPort:        9153/TCP
Endpoints:         10.200.36.113:9153,10.200.36.122:9153
Session Affinity:  None
Events:            <none>

node节点发现及指标

    - job_name: 'kubernetes-node'
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'    #正则匹配10250的端口(kubelet)
        replacement: '${1}:9100'    #替换为9100端口
        target_label: __address__
        action: replace    #赋值给target_label
      - action: labelmap    
        regex: __meta_kubernetes_node_label_(.+)    #发现新的lable并用新的service name作为label、将发现的值依然新的label的值

 

常见的node节点指标 

node_cpu_: CPU相关指标
node_load1 : load average #系统负载指标

node_memory_:内存相关指标

node_network_:网络相关指标

node_boot_time_seconds:系统启动时间监控

go_*: node exporte运行过程中go相关指标

process_*: node exporter运行时进程内部进程指标

Pod发现 

配置

    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
      - role: pod
        namespaces: #可选指定namepace,如果不指定就是发现所有的namespace中的pod
          names:
          - myserver
          - magedu
          - monitoring
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name

    - job_name: 'kubernetes-node-cadvisor'
      kubernetes_sd_configs:
      - role:  node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt    #默认证书路径
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token    #默认token路径
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443    ##replacement指定的替换后的标(target_label)对应的值为kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__    #重写
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

注意tls_config配置的证书地址是每个Pod连接apiserver 所使用的地址,无论证书是否用得上,在Pod启动的时候kubelet都会给每一个pod自动注入 ca的公钥,即所有的pod启动的时候都会有一个ca公钥被注入进去用于在访问apiserver的时候被调用。

pod监控指标

sum(rate(container_cpu_usage_seconds_total{imagel=""}[1m])) without(instance)
sum(rate(container_memory_usage_bytes{image!=""}][1m])) without (instance)
sum(rate(container_fs_io_current{imagel=""}[1m])) without (device)
sum(rate(container_fs_writes_bytes_total{imagel=""}[1m])) without (device)
sum(rate(container_fs_reads_bytes_total{fimagel=""}[1m])) without (device)
sum(rate(container_network_receive_bytes_total{imagel=""}[1m)) without (interface)

虚拟机部署Prometheus发现k8s中pod

获取token

[root@k8s-master1 prometheus]# kubectl get sa -n monitoring 
NAME      SECRETS   AGE
default   1         45d
monitor   1         2d1h
[root@k8s-master1 prometheus]# kubectl get sa monitor -n monitoring -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: "2022-06-15T14:17:04Z"
  name: monitor
  namespace: monitoring
  resourceVersion: "625848"
  uid: 8e3b8beb-c6a8-4826-8d13-9a3df617ae2c
secrets:
- name: monitor-token-vnj95
[root@k8s-master1 prometheus]# kubectl describe secrets monitor-token-vnj95 -n monitoring
Name:         monitor-token-vnj95
Namespace:    monitoring
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: monitor
              kubernetes.io/service-account.uid: 8e3b8beb-c6a8-4826-8d13-9a3df617ae2c

Type:  kubernetes.io/service-account-token

Data
====
ca.crt:     1302 bytes
namespace:  10 bytes
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6IlJjVXRjX0FEemIxb0Y0OFIyOU03OFEyNkxNbUJNOV9JS25JNmtXbFBKeWsifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJtb25pdG9yaW5nIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Im1vbml0b3ItdG9rZW4tdm5qOTUiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibW9uaXRvciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjhlM2I4YmViLWM2YTgtNDgyNi04ZDEzLTlhM2RmNjE3YWUyYyIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDptb25pdG9yaW5nOm1vbml0b3IifQ.Dn_-RFFDT-SBpM5txQFQZv0XoKZWBOHo--eJWhLsDGAeMPFtqQ-aOJP5tZVuGOv2TzC8UrNFq6BtQtt5RddkH4tK9O0DCHRx8JWnh8Lvn347z7nU179zge7hg3OuhmFKyLw6AsE0DhP_3picDgawUnSoXD1FqW9SEcyY75IiLf0MgxhkU4JjXLxwLRqWdHqk4QZSUthSm9Vfbgeq1BhhKYPYc_D579bWg6hisGp107oxFTj-Q13Rf9vpiBLMx4OJcJwcrZNW9SOTjeobFUOSLZjCqSfiO3avN4QVVIP9HiDOmZXNtuvbd1fnawSEILc07hRSts5VEaZA9eA_VLKf0g

prometheus添加 job

  - job_name: 'kubernetes-发现指定namespace的所有pod'
    kubernetes_sd_configs:
    - role: pod
      api_server: https://192.168.226.144:6443
      tls_config:
        insecure_skip_verify: true
      bearer_token_file: /apps/prometheus/k8s.token
      namespaces:
        names:
        - myserver
        - magedu
        - monitoring
        - kubernetes-dashboard
    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_pod_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: kubernetes_pod_name

重启验证

 Prometheus静态配置static_configs

  - job_name: "prometheus-node"
    static_configs:
      - targets: ["192.168.226.145:9100","192.168.226.152:9100","192.168.226.146:9100","192.168.226.144:9100"]

 consul_sd_configs

Consul是分布式k/数据存储集群,目前常用于服务的服务注册和发现。Pod将信息注册到consul,Prometheus会定期从consul中获取信息。

部署consul并验证集群

wget https://releases.hashicorp.com/consul/1.12.2/consul_1.12.2_linux_amd64.zip

nohup consul agent -server -bootstrap -bind=192.168.226.152 -client=192.168.226.152 -data-dir=/data/consul -ui -node=192.168.226.152 &


nohup consul agent   -bind=192.168.226.144 -client=192.168.226.144 -data-dir=/data/consul  -node=192.168.226.144 -join=192.168.226.152 &

 

测试注册服务  

 curl -X PUT -d '{"id": "node-exporter144","name": "k8s-node-exporter144","address": "192.168.226.144","port":9100,"tags": ["node-exporter"],"checks": [{"http": "http://192.168.226.144:9100/","interval": "5s"}]}' http://192.168.226.144:8500/v1/agent/service/register


 curl -X PUT -d '{"id": "node-exporter152","name": "k8s-node-exporter152","address": "192.168.226.152","port":9100,"tags": ["node-exporter"],"checks": [{"http": "http://192.168.226.152:9100/","interval": "5s"}]}' http://192.168.226.152:8500/v1/agent/service/register

 在Prometheus中配置consul来发现服务

  - job_name: consul
    honor_labels: true
    metrics_path: /metrics
    scheme: http
    consul_sd_configs:
      - server: 192.168.226.152:8500
        services: []  #发现的目标服务名称,空为所有服务,可以写servicea,servcieb,servicec
      - server: 192.168.226.144:8500
        services: []
    relabel_configs:
    - source_labels: ['__meta_consul_tags']
      target_label: 'product'
    - source_labels: ['__meta_consul_dc']
      target_label: 'idc'
    - source_labels: ['__meta_consul_service']
      regex: "consul"
      action: drop

 服务删除

curl --request PUT http://192.168.226.152:8500/v1/agent/service/deregister/node-exporter152

 

 file_sd_configs 文件的服务发现

创建json文件

[root@lvs-backup prometheus]# vim file_sd/sd_myserver.json 

[
  {
    "targets": ["192.168.226.144:9100","192.168.226.145:9100","192.168.226.146:9100"]

  }

]

配置Prometheus调用json文件

  - job_name: 'file_sd_my_server'
    file_sd_configs:
      - files:
        - /apps/prometheus/file_sd/sd_myserver.json
        refresh_interval: 10s    #检查周期

验证

 DNS服务发现

基于DNS的服务发现允许配置指定一组DNS域名,这些域名会定期查询以发现目标列表,域名需要可以被配置的DNS服务器解析为IP。

此服务发现方法仅支持基本的DNS A、AAAA和SRV记录查询。A记录:域名解析为IP

SRV:SRV记录了哪台计算机提供了具体哪个服务。格式为:自定义的服务的名字.势议的类型.城名(例加:_example-server._tcp.www.mydns.com)

配置

  - job_name: 'dns-server-name-monitor'
    metrics_path: "/metrics"
    dns_sd_configs:
    - names: ["www,baidutest.com", "www,huaweitest.com"]
      type: A
      port: 6010

 kube-state-metrics组件介绍

Kube-state-metrics:通过监听API Server生成有关资源对象的状态指标,比如 Deployment、Node.Pod,需要注意的是kube-state-metrics只是简单的提供一个metrics数据,并不会存储这些指标数据,所以我们可以使用Prometheus来抓取这些数据然后存储,主要关注的是业务相关的一些元数据,比如Deployment、Pod、副本状态等,调度了多少个replicas?现在可用的有几个?多少个 Pod是running/stopped/terminated 状态?Pod重启了多少次?目前有多少job在运行中。

部署Kube-state-metrics

[root@k8s-master1 prometheus]# cat case5-kube-state-metrics-deploy.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-state-metrics
  template:
    metadata:
      labels:
        app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: bitnami/kube-state-metrics:2.5.0 
        ports:
        - containerPort: 8080

---
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups: [""]
  resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]
  verbs: ["list", "watch"]
- apiGroups: ["extensions"]
  resources: ["daemonsets", "deployments", "replicasets"]
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources: ["statefulsets"]
  verbs: ["list", "watch"]
- apiGroups: ["batch"]
  resources: ["cronjobs", "jobs"]
  verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
  resources: ["horizontalpodautoscalers"]
  verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system

---
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
  name: kube-state-metrics
  namespace: kube-system
  labels:
    app: kube-state-metrics
spec:
  type: NodePort
  ports:
  - name: kube-state-metrics
    port: 8080
    targetPort: 8080
    nodePort: 31666
    protocol: TCP
  selector:
    app: kube-state-metrics

验证

添加到Prometheus

 - job_name: "prometheus-kube-state-metrics"
    static_configs:
      - targets: ["192.168.226.146:31666"]

 

 grafana导入模板

 

 

 监控tomcat

基于第三方exporter实现对目的服务的监控,然后Prometheus读取svc中数据进行展示。不过多个pod的情况会漏数据,还是通过api服务发现统计准确

构建镜像

[root@k8s-master1 tomcat-image]# cat Dockerfile 
#FROM tomcat:8.5.73-jdk11-corretto 
FROM tomcat:8.5.73

LABEL maintainer="jack 2973707860@qq.com"

ADD server.xml /usr/local/tomcat/conf/server.xml 

RUN mkdir /data/tomcat/webapps -p
ADD myapp /data/tomcat/webapps/myapp
ADD metrics.war /data/tomcat/webapps 
ADD simpleclient-0.8.0.jar  /usr/local/tomcat/lib/
ADD simpleclient_common-0.8.0.jar /usr/local/tomcat/lib/
ADD simpleclient_hotspot-0.8.0.jar /usr/local/tomcat/lib/
ADD simpleclient_servlet-0.8.0.jar /usr/local/tomcat/lib/
ADD tomcat_exporter_client-0.0.12.jar /usr/local/tomcat/lib/


#ADD run_tomcat.sh /apps/tomcat/bin/

EXPOSE 8080 8443 8009

CMD ["/usr/local/tomcat/bin/catalina.sh","run"]

#CMD ["/apps/tomcat/bin/run_tomcat.sh"]

创建pod并验证

[root@k8s-master1 yaml]# cat tomcat-deploy.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tomcat-deployment
  namespace: default
spec:
  selector: 
    matchLabels: 
     app: tomcat
  replicas: 1 # tells deployment to run 2 pods matching the template
  template: # create pods using pod definition in this template
    metadata:
      labels:
        app: tomcat
      annotations:
        prometheus.io/scrape: 'true'
    spec:
      containers:
      - name: tomcat
        image: k8s-harbor.com/public/tomcat-app1:v0618
        ports:
        - containerPort: 8080
        securityContext: 
          privileged: true
[root@k8s-master1 yaml]# cat tomcat-svc.yaml 
kind: Service  #service 类型
apiVersion: v1
metadata:
  annotations:
    prometheus.io/scrape: 'true'
  name: tomcat-service
spec:
  selector:
    app: tomcat
  ports:
  - nodePort: 31080
    port: 80
    protocol: TCP
    targetPort: 8080
  type: NodePort

配置Prometheus收集数据

 - job_name: "prometheus-tomcat-metrics"
    static_configs:
      - targets: ["192.168.226.146:31080"]

 

 配置grafana模板

wget https://github.com/nlighten/tomcat_exporter/blob/master/dashboard/example.json

 

 监控redis

https://github.com/oliver006/redis_exporter

通过redis_exporter监控redis 服务状态。 

部署redis

[root@k8s-master1 yaml]# cat redis-deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  namespace: studylinux-net 
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:4.0.14 
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 6379
      - name: redis-exporter
        image: oliver006/redis_exporter:latest
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 9121

[root@k8s-master1 yaml]# cat redis-exporter-svc.yaml 
kind: Service  #service 类型
apiVersion: v1
metadata:
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: "9121"
  name: redis-exporter-service
  namespace: studylinux-net 
spec:
  selector:
    app: redis
  ports:
  - nodePort: 31082
    name: prom
    port: 9121
    protocol: TCP
    targetPort: 9121
  type: NodePort
[root@k8s-master1 yaml]# cat redis-redis-svc.yaml 
kind: Service  #service 类型
apiVersion: v1
metadata:
  annotations:
    prometheus.io/scrape: 'true'
  name: redis-redis-service
  namespace: studylinux-net 
spec:
  selector:
    app: redis
  ports:
  - nodePort: 31081
    name: redis
    port: 6379
    protocol: TCP
    targetPort: 6379
  type: NodePort

验证

 

 配置Prometheus收集数据

 - job_name: "prometheus-redis-metrics"
    static_configs:
      - targets: ["192.168.226.146:31082"]

 

grafana导入模板 

 mysql监控

 创建用户并验证

CREATE USER 'mysql_exporter'@'localhost' IDENTIFIED BY 'imnot007*';
GRANT PROCESS, REPLICATION CLIENT,SELECT ON *.* TO 'mysql_exporter'@'localhost';
root@lvs-backup mysql-5.6.43-onekey-install]# mysql -umysql_exporter -pimnot007* -hlocalhost
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.6.43 MySQL Community Server (GPL)

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]> Bye

 配置免密登录

[root@lvs-backup prometheus]# cat /root/.my.cnf 
[client]
user=mysql_exporter
password=imnot007*

下载mysqld_exporter并验证 

https://github.com/prometheus/mysqld_exporter/releases/download/v0.14.0/mysqld_exporter-0.14.0.freebsd-amd64.tar.gz

/usr/local/bin/mysqld_exporter-0.14.0.linux-amd64/mysqld_exporter  --config.my-cnf=/root/.my.cnf

 

配置Prometheus收集数据 

 - job_name: "prometheus-mysql-metrics"
    static_configs:
      - targets: ["192.168.226.152:9104"]

 grafana导入模板

 

监控haproxy

部署haproxy

yum install haproxy
vim /etc/haproxy/haproxy.cfg

stats socket /run/haproxy/admin.sock mode 660 level admin

listen k8s_service_ng_6666
   bind 192.168.226.152:80
   mode tcp
   server node1 192.168.226.152:9090 check inter 2000 fall 3 rise 5

 部署并启动haproxy_exporter

https://github.com/prometheus/mysqld_exporter/releases
./haproxy_exporter --haproxy.scrape-uri=unix:/var/lib/haproxy/admin.sock

配置Prometheus收集数据

 - job_name: "prometheus-haproxy-metrics"
    static_configs:
      - targets: ["192.168.226.152:9101"]

  grafana导入模板

 监控nginx

通过 nginx-module-vts将nginx的数据解析成Prometheus能够读取的数据

获取 nginx-module-vts 

git clone https://github.com/vozlt/nginx-module-vts.git

编译nginx并启动

./configure --prefix=/apps/nginx \
--with-pcre \
--with-http_ssl_module \
--with-http_v2_module \
--with-http_realip_module \
--with-http_gzip_static_module \
--with-http_stub_status_module \
--with-threads \
--with-file-aio \
--with-stream \
--with-stream_ssl_module \
--with-stream_realip_module \
--add-module=/usr/local/bin/nginx-module-vts/

make
make install

修改nginx配置设置状态页

vim /apps/nginx/conf/nginx.conf

    #gzip  on;
    vhost_traffic_status_zone;

       location /status {
          vhost_traffic_status_display;
          vhost_traffic_status_display_format html;
        }

验证 

已转换成Prometheus能够读取的数据格式,需要exporter进行提取给Prometheus

安装nginx exporter

https://github.com/hnlq715/nginx-vts-exporter/releases/download/v0.10.3/nginx-vts-exporter-0.10.3.linux-amd64.tar.gz


./nginx-vts-exporter -nginx.scrape_uri http://192.168.226.152/status/format/json

 

配置Prometheus收集数据 

  - job_name: "prometheus-nginx-metrics"
    static_configs:
      - targets: ["192.168.226.152:9913"]

   grafana导入模板

blackbox_exporter监控

blackbox_exporter是Prometheus官方提供的exporter,可以通过http(可用性检测),https(可用性检测),dns(域名解析),tcp(端口监听检测)和icmp(主机存活检测)对被监控节点进行监控和数据采集。

部署blackbox_exporter并启动

 ./blackbox_exporter --config.file=/usr/local/bin/blackbox_exporter-0.21.0.linux-amd64/blackbox.yml --web.listen-address=:9115

 

 blackbox_exporter实现url监控

Prometheus中添加job 

 Prometheus将监控对象传给blackbox,blackbox拿着监控对象去采集数据。

  - job_name: 'http_status'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets: ['http://www.xiaomi.com', 'https://consumer.huawei.com']
        labels:
          instance: http_status
          group: web
    relabel_configs:
      - source_labels: [__address__] #relabel通过将__address__(当前目标地址)写入__param_target标签来创建一个label。
        target_label: __param_target #监控目标www.xiaomi.com,作为__address__的value
      - source_labels: [__param_target] #监控目标
        target_label: url #将监控目标与url创建一个label
      - target_label: __address__
        replacement: 192.168.226.152:9115

验证

 

 blackbox_exporter实现icmp监控 

  - job_name: 'ping_status'
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets: ['172.31.0.2',"223.6.6.6"]
        labels:
          instance: 'ping_status'
          group: 'icmp'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: ip #将ip与__param_target创建一个label
      - target_label: __address__
        replacement: 192.168.226.152:9115

  blackbox_exporter实现端口监控 

  - job_name: 'port_status'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets: ['192.168.226.152:9090', 'http://www.xiaomi.com:80']
        labels:
          instance: 'port_status'
          group: 'port'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: ip
      - target_label: __address__
        replacement: 192.168.226.152:9115

 

grafana导入模板

 

 

 

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值