prometheus容器监控系统 (二)

prometheus容器监控系统 (二)


准备实验

主机ip内存系统备注
Prometheus-server192.168.200.1322Gcentos7.5
Prometheus-client1192.168.200.1471Gcentos7.5
Prometheus-client2192.168.200.1311Gcentos7.5
k8s-master192.168.200.1324Gcentos7.5完全克隆磁盘40G.CPU2核及以上
k8s-node192.168.200.1374Gcentos7.5完全克隆磁盘40G.CPU2核及以上

监控k8s

监控指标

kubernetes本身监控
- node资源利用率
- node数量
- pods数量
- 资源对象妆态

pod监控
- pod数量(项目)
- 容器资源利用率

- 应用程序

监控指标具体实现eg
node资源利用率node-exporter节点cpu,内存利用率
pod资源类利用率cadvisor容器cpu,内存利用率
k8s资源对象状态kube-state-metricspod/deployment/service
服务发现类型描述
node发现集群中的节点,默认地址为kubelet的http端口
service发现所有service及端口为目标
pod发现所有pod为目标
ednpoints从service列表中的endpoint发现pod为目标
ingress发现ingress路径为目标

简易部署要给k8s集群

给k8s集群修改主机名

 
  1. ###master
  2. [root@k8s-master ~]# hostnamectl set-hostname k8s-master
  3. ###node
  4. [root@k8s-node1 ~]# hostnamectl set-hostname k8s-node1

同步时间,并临时关闭swap

 
  1. ##在两台服务器上执行下边的操作
  2. ntpdate ntp1.aliyun.com
  3. swapoff -a

准备工作两边都一样

 
  1. ###两边都是一样的
  2. [root@k8s-master ~]# vim /etc/hosts
  3. 192.168.200.142 k8s-master
  4. 192.168.200.141 k8s-node1
  5. [root@k8s-master ~]# vim /etc/sysctl.d/k8s.conf
  6. net.bridge.bridge-nf-call-ip6tables = 1
  7. net.bridge.bridge-nf-call-iptables = 1
  8. net.ipv4.ip_forward=1
  9. sysctl --system

安装Docker/kubeadm/kubelet【在所有的节点上】

 
  1. wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo
  2. yum -y install docker-ce
  3. systemctl enable docker && systemctl start docker
  4. vim /etc/docker/daemon.json
  5. {
  6. "registry-mirrors": ["https://b9pmyelo.mirror.aliyuncs.com"]
  7. }
  8. systemctl restart docker
  9. vim /etc/yum.repos.d/kubernets.repo
  10. [kubernetes]
  11. name=Kubernetes
  12. baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
  13. enabled=1
  14. gpgcheck=0
  15. repo_gpgcheck=0
  16. gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
  17. yum install -y kubelet-1.18.0 kubeadm-1.18.0 kubectl-1.18.0
  18. systemctl enable kubelet

部署k8s-master节点

 
  1. kubeadm init \
  2. --apiserver-advertise-address=192.168.200.142 \
  3. --image-repository registry.aliyuncs.com/google_containers \
  4. --kubernetes-version v1.18.0 \
  5. --service-cidr=10.96.0.0/12 \
  6. --pod-network-cidr=10.244.0.0/16 \
  7. --ignore-preflight-errors=all
  8. ###出现一下字样ok ,网不好可能的需要漫长的等待
  9. kubeadm join 192.168.200.142:6443 --token tvk2tg.8ml2hqzhvdfphh1t \
  10. --discovery-token-ca-cert-hash sha256:1f5447a7a9b050dda3e3c5692a523e1ac0c5146a5dd14aa47081e93ef05b055c
  11. [root@k8s-master yum.repos.d]# echo "$?"
  12. 0
  13. [root@k8s-master ~]# mkdir -p $HOME/.kube
  14. [root@k8s-master ~]# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  15. [root@k8s-master ~]# chown $(id -u):$(id -g) $HOME/.kube/config
  16. [root@k8s-master ~]# ll .kube/config
  17. -rw------- 1 root root 5451 6月 5 23:07 .kube/config
  18. [root@k8s-master ~]# kubectl get nodes
  19. NAME STATUS ROLES AGE VERSION
  20. k8s-master NotReady master 3m23s v1.18.0
  21. ####
  22. --apiserver-advertise-address 集群通告地址
  23. --image-repository 由于默认拉取镜像地址k8s.gcr.io国内无法访问,这里指定阿里云镜像仓库地址
  24. --kubernetes-version K8s版本,与上面安装的一致
  25. --service-cidr 集群内部虚拟网络,Pod统一访问入口
  26. --pod-network-cidr Pod网络,,与下面部署的CNI网络组件yaml中保持一致

部署node节点

 
  1. kubeadm join 192.168.200.142:6443 --token tvk2tg.8ml2hqzhvdfphh1t \
  2. --discovery-token-ca-cert-hash sha256:1f5447a7a9b050dda3e3c5692a523e1ac0c5146a5dd14aa47081e93ef05b055c ##这里的验证就是我们在master端部署的时候,给的验证。不要抄错了
  3. This node has joined the cluster:
  4. * Certificate signing request was sent to apiserver and a response was received.
  5. * The Kubelet was informed of the new secure connection details.
  6. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
  7. ###在master端查看
  8. [root@k8s-master ~]# kubectl get nodes
  9. NAME STATUS ROLES AGE VERSION
  10. k8s-master NotReady master 6m40s v1.18.0
  11. k8s-node1 NotReady <none> 76s v1.18.0

加载网络插件(master和node都需要)

 
  1. wget https://docs.projectcalico.org/v3.8/manifests/calico.yaml
  2. [root@k8s-master ~]# sed -i -e "s?192.168.0.0/16?10.244.0.0/16?g" calico.yaml
  3. [root@k8s-master ~]# kubectl apply -f calico.yaml
  4. daemonset.apps/calico-node created
  5. serviceaccount/calico-node created
  6. deployment.apps/calico-kube-controllers created
  7. serviceaccount/calico-kube-controllers created
  8. [root@k8s-master ~]# kubectl get nodes ###有的时候需要等会
  9. NAME STATUS ROLES AGE VERSION
  10. k8s-master Ready master 13m v1.18.0

创建一个可被监控的节点

 
  1. [root@k8s-master ~]# kubectl get nodes
  2. NAME STATUS ROLES AGE VERSION
  3. k8s-master Ready master 3d9h v1.18.0
  4. [root@k8s-master ~]# kubectl create deployment web --image=nginx
  5. deployment.apps/web created
  6. [root@k8s-master ~]# kubectl get pods ###由于是主节点所以时集群被污点,不会去分配,我们把污点去带哦
  7. NAME READY STATUS RESTARTS AGE
  8. web-5dcb957ccc-kpvdr 0/1 Pending 0 28s
  9. [root@k8s-master ~]# kubectl describe node | grep Taint 过滤污点
  10. Taints: node-role.kubernetes.io/master:NoSchedule
  11. [root@k8s-master ~]# kubectl taint node k8s-master node-role.kubernetes.io/master- ###将节点的污点去掉
  12. node/k8s-master untainted
  13. [root@k8s-master ~]# kubectl get pods
  14. NAME READY STATUS RESTARTS AGE
  15. web-5dcb957ccc-kpvdr 1/1 Running 0 4m59s
  16. [root@k8s-master ~]# kubectl get pods -n kube-system ###在kube-system启动的相关组件
  17. NAME READY STATUS RESTARTS AGE
  18. calico-kube-controllers-75d555c48-vs97h 1/1 Running 0 3d8h
  19. calico-node-52lr6 1/1 Running 0 3d8h
  20. coredns-7ff77c879f-7xfmt 1/1 Running 0 3d9h
  21. coredns-7ff77c879f-mcl9f 1/1 Running 0 3d9h
  22. etcd-k8s-master 1/1 Running 0 3d9h
  23. kube-apiserver-k8s-master 1/1 Running 0 3d9h
  24. kube-controller-manager-k8s-master 1/1 Running 0 3d9h
  25. kube-proxy-xhgk5 1/1 Running 0 3d9h
  26. kube-scheduler-k8s-master 1/1 Running 0 3d9h

Prometheus监控pod资源

 
  1. [root@k8s-master ~]# kubectl get ep
  2. NAME ENDPOINTS AGE
  3. kubernetes 192.168.200.142:6443 3d9h ###Prometheus需要访问这个api的接口,但是6443是不会轻易被开放的所以需要验证。
  4. ###进行rbac授权
  5. [root@k8s-master ~]# vim rbac.yaml
  6. apiVersion: v1
  7. kind: ServiceAccount
  8. metadata:
  9. name: prometheus
  10. namespace: kube-system
  11. ---
  12. apiVersion: rbac.authorization.k8s.io/v1beta1
  13. kind: ClusterRole
  14. metadata:
  15. name: prometheus
  16. rules:
  17. - apiGroups:
  18. - ""
  19. resources:
  20. - nodes
  21. - services
  22. - endpoints
  23. - pods
  24. - nodes/proxy
  25. verbs:
  26. - get
  27. - list
  28. - watch
  29. - apiGroups:
  30. - "extensions"
  31. resources:
  32. - ingresses
  33. verbs:
  34. - get
  35. - list
  36. - watch
  37. - apiGroups:
  38. - ""
  39. resources:
  40. - configmaps
  41. - nodes/metrics
  42. verbs:
  43. - get
  44. - nonResourceURLs:
  45. - /metrics
  46. verbs:
  47. - get
  48. ---
  49. apiVersion: rbac.authorization.k8s.io/v1beta1
  50. kind: ClusterRoleBinding
  51. metadata:
  52. name: prometheus
  53. roleRef:
  54. apiGroup: rbac.authorization.k8s.io
  55. kind: ClusterRole
  56. name: prometheus
  57. subjects:
  58. - kind: ServiceAccount
  59. name: prometheus
  60. namespace: kube-system
  61. [root@k8s-master ~]# kubectl apply -f rbac.yaml
  62. serviceaccount/prometheus created
  63. clusterrole.rbac.authorization.k8s.io/prometheus created
  64. clusterrolebinding.rbac.authorization.k8s.io/prometheus created ###出现这三行创建成功
  65. ###下面获取授权的token
  66. [root@k8s-master ~]# kubectl get sa -n kube-system | grep promethe
  67. prometheus 1 3m50s ###这个是我们刚刚授权的文件
  68. [root@k8s-master ~]# kubectl get sa prometheus -n kube-system -o yaml ###导出这个资源的yaml
  69. apiVersion: v1
  70. kind: ServiceAccount
  71. metadata:
  72. annotations:
  73. kubectl.kubernetes.io/last-applied-configuration: |
  74. {"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"name":"prometheus","namespace":"kube-system"}}
  75. creationTimestamp: "2021-06-09T13:02:59Z"
  76. managedFields:
  77. - apiVersion: v1
  78. fieldsType: FieldsV1
  79. fieldsV1:
  80. f:secrets:
  81. .: {}
  82. k:{"name":"prometheus-token-f8s8g"}:
  83. .: {}
  84. f:name: {}
  85. manager: kube-controller-manager
  86. operation: Update
  87. time: "2021-06-09T13:02:59Z"
  88. - apiVersion: v1
  89. fieldsType: FieldsV1
  90. fieldsV1:
  91. f:metadata:
  92. f:annotations:
  93. .: {}
  94. f:kubectl.kubernetes.io/last-applied-configuration: {}
  95. manager: kubectl
  96. operation: Update
  97. time: "2021-06-09T13:02:59Z"
  98. name: prometheus
  99. namespace: kube-system
  100. resourceVersion: "18462"
  101. selfLink: /api/v1/namespaces/kube-system/serviceaccounts/prometheus
  102. uid: 26fda85a-70d3-46d2-9a74-6869fdb1c9fc
  103. secrets:
  104. - name: prometheus-token-f8s8g ####name后边的值就是我们授权的token的名称
  105. [root@k8s-master ~]# kubectl describe secret prometheus-token-f8s8g -n kube-system ###拿到我们的授权,下面的token后边就是授权密码
  106. Name: prometheus-token-f8s8g
  107. Namespace: kube-system
  108. Labels: <none>
  109. Annotations: kubernetes.io/service-account.name: prometheus
  110. kubernetes.io/service-account.uid: 26fda85a-70d3-46d2-9a74-6869fdb1c9fc
  111. Type: kubernetes.io/service-account-token
  112. Data
  113. ====
  114. ca.crt: 1025 bytes
  115. namespace: 11 bytes
  116. token: eyJhbGciOiJSUzI1NiIsImtpZCI6Inl2Y3JqOXBOUkNLajRFZ0FHR2dhYTFra1d4X2VjUzRYbGp5Q3VXUnY2V28ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLXRva2VuLWY4czhnIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InByb21ldGhldXMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiIyNmZkYTg1YS03MGQzLTQ2ZDItOWE3NC02ODY5ZmRiMWM5ZmMiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06cHJvbWV0aGV1cyJ9.uh9IiJWbvwkOT9Jos9Jtzo3n4k2KGpEfm8vpO5PefxmHztlsGe2UIXKaJoF1E-HWjedv4fHVH2_ATMegqoZgRXdC2HUnPO-HmOooC5A561KYj-9Mru-MMYfQPA-2zshjBsbLU2ngM0GM1d9B-ec_Aq3tHTQT1LF_sdpciLNmcoTjQwE1zK76q-L1X97lPMwopWTZefQs93Y86Z-c3kufAFx3MOE2iwMI--EDorjjMENnjvu3I-zZmU-i_c4PeBVlwA_AZfUvuPRMYfkvY7e9cMbqLkkRvjaiPxDe_6Amr4l4kzIvJkC4946_L8Q9QD1KdC7W6YPShpW40ZpMbU10cg
  117. [root@k8s-master ~]# kubectl describe secret prometheus-token-f8s8g -n kube-system > token.k8s ###将值导出到文件并且发送到Prometheus的server端一会要使用这个token来访问
  118. [root@k8s-master ~]# scp token.k8s 192.168.200.132:/root/
  119. The authenticity of host '192.168.200.132 (192.168.200.132)' can't be established.
  120. ECDSA key fingerprint is SHA256:DaZ1qd1UDlXKAYiTYk5ZBjwvWxkEwQJmHew7PIK78wA.
  121. ECDSA key fingerprint is MD5:79:cb:6d:09:0e:03:e8:58:3b:e4:81:88:da:07:7d:9f.
  122. Are you sure you want to continue connecting (yes/no)? yes
  123. Warning: Permanently added '192.168.200.132' (ECDSA) to the list of known hosts.
  124. root@192.168.200.132's password:
  125. token.k8s
  126. ###在Prometheus服务器上操作
  127. [root@bogon ~]# mv token.k8s /usr/local/prometheus/
  128. [root@bogon ~]# cd /usr/local/prometheus/
  129. [root@bogon prometheus]# ls
  130. console_libraries consoles data LICENSE NOTICE prometheus prometheus.yml promtool sd_config token.k8s
  131. [root@bogon prometheus]# vim token.k8s ##修改这个文件使其只剩下token的值
  132. eyJhbGciOiJSUzI1NiIsImtpZCI6Inl2Y3JqOXBOUkNLajRFZ0FHR2dhYTFra1d4X2VjUzRYbGp5Q3VXUnY2V28ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLXRva2VuLWY4czhnIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InByb21ldGhldXMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiIyNmZkYTg1YS03MGQzLTQ2ZDItOWE3NC02ODY5ZmRiMWM5ZmMiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06cHJvbWV0aGV1cyJ9.uh9IiJWbvwkOT9Jos9Jtzo3n4k2KGpEfm8vpO5PefxmHztlsGe2UIXKaJoF1E-HWjedv4fHVH2_ATMegqoZgRXdC2HUnPO-HmOooC5A561KYj-9Mru-MMYfQPA-2zshjBsbLU2ngM0GM1d9B-ec_Aq3tHTQT1LF_sdpciLNmcoTjQwE1zK76q-L1X97lPMwopWTZefQs93Y86Z-c3kufAFx3MOE2iwMI--EDorjjMENnjvu3I-zZmU-i_c4PeBVlwA_AZfUvuPRMYfkvY7e9cMbqLkkRvjaiPxDe_6Amr4l4kzIvJkC4946_L8Q9QD1KdC7W6YPShpW40ZpMbU10cg
  133. [root@bogon prometheus]#cd ~
  134. [root@bogon ~]# rz -E
  135. rz waiting to receive.
  136. [root@bogon ~]# ls
  137. anaconda-ks.cfg prometheus-2.25.2.linux-amd64.tar.gz
  138. grafana-7.3.1.linux-amd64.tar.gz prometheus.yml
  139. [root@bogon ~]# cat prometheus.yml
  140. # my global config
  141. global:
  142. scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  143. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  144. # scrape_timeout is set to the global default (10s).
  145. # Alertmanager configuration
  146. alerting:
  147. alertmanagers:
  148. - static_configs:
  149. - targets:
  150. # - alertmanager:9093
  151. # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
  152. rule_files:
  153. # - "first_rules.yml"
  154. # - "second_rules.yml"
  155. # A scrape configuration containing exactly one endpoint to scrape:
  156. # Here it's Prometheus itself.
  157. scrape_configs:
  158. # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  159. - job_name: 'prometheus'
  160. # metrics_path defaults to '/metrics'
  161. # scheme defaults to 'http'.
  162. static_configs:
  163. - targets: ['localhost:9090']
  164. --------------------------------------------------------------------第一步分复制,把#号去了
  165. - job_name: kubernetes-nodes-cadvisor
  166. metrics_path: /metrics
  167. scheme: https
  168. kubernetes_sd_configs:
  169. - role: node
  170. api_server: https://192.168.200.142:6443
  171. bearer_token_file: /usr/local/prometheus/token.k8s
  172. tls_config:
  173. insecure_skip_verify: true
  174. bearer_token_file: /usr/local/prometheus/token.k8s
  175. tls_config:
  176. insecure_skip_verify: true
  177. relabel_configs:
  178. # 将标签(.*)作为新标签名,原有值不变
  179. - action: labelmap
  180. regex: __meta_kubernetes_node_label_(.*)
  181. # 修改NodeIP:10250为APIServerIP:6443
  182. - action: replace
  183. regex: (.*)
  184. source_labels: ["__address__"]
  185. target_label: __address__
  186. replacement: 192.168.200.142:6443
  187. # 实际访问指标接口 https://NodeIP:10250/metrics/cadvisor 这个接口只能APISERVER访问,故此重新标记标签使用APISERVER代理访问
  188. - action: replace
  189. source_labels: [__meta_kubernetes_node_name]
  190. target_label: __metrics_path__
  191. regex: (.*)
  192. replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
  193. ------------------------------------------第一部分复制把#号去了
  194. - job_name: kubernetes-service-endpoints
  195. kubernetes_sd_configs:
  196. - role: endpoints
  197. api_server: https://192.168.200.142:6443
  198. bearer_token_file: /usr/local/prometheus/token.k8s
  199. tls_config:
  200. insecure_skip_verify: true
  201. bearer_token_file: /usr/local/prometheus/token.k8s
  202. tls_config:
  203. insecure_skip_verify: true
  204. # Service没配置注解prometheus.io/scrape的不采集
  205. relabel_configs:
  206. - action: keep
  207. regex: true
  208. source_labels:
  209. - __meta_kubernetes_service_annotation_prometheus_io_scrape
  210. # 重命名采集目标协议
  211. - action: replace
  212. regex: (https?)
  213. source_labels:
  214. - __meta_kubernetes_service_annotation_prometheus_io_scheme
  215. target_label: __scheme__
  216. # 重命名采集目标指标URL路径
  217. - action: replace
  218. regex: (.+)
  219. source_labels:
  220. - __meta_kubernetes_service_annotation_prometheus_io_path
  221. target_label: __metrics_path__
  222. # 重命名采集目标地址
  223. - action: replace
  224. regex: ([^:]+)(?::\d+)?;(\d+)
  225. replacement: $1:$2
  226. source_labels:
  227. - __address__
  228. - __meta_kubernetes_service_annotation_prometheus_io_port
  229. target_label: __address__
  230. # 将K8s标签(.*)作为新标签名,原有值不变
  231. - action: labelmap
  232. regex: __meta_kubernetes_service_label_(.+)
  233. # 生成命名空间标签
  234. - action: replace
  235. source_labels:
  236. - __meta_kubernetes_namespace
  237. target_label: kubernetes_namespace
  238. # 生成Service名称标签
  239. - action: replace
  240. source_labels:
  241. - __meta_kubernetes_service_name
  242. target_label: kubernetes_service_name
  243. - job_name: kubernetes-pods
  244. kubernetes_sd_configs:
  245. - role: pod
  246. api_server: https://192.168.200.142:6443
  247. bearer_token_file: /usr/local/prometheus/token.k8s
  248. tls_config:
  249. insecure_skip_verify: true
  250. bearer_token_file: /usr/local/prometheus/token.k8s
  251. tls_config:
  252. insecure_skip_verify: true
  253. # 重命名采集目标协议
  254. relabel_configs:
  255. - action: keep
  256. regex: true
  257. source_labels:
  258. - __meta_kubernetes_pod_annotation_prometheus_io_scrape
  259. # 重命名采集目标指标URL路径
  260. - action: replace
  261. regex: (.+)
  262. source_labels:
  263. - __meta_kubernetes_pod_annotation_prometheus_io_path
  264. target_label: __metrics_path__
  265. # 重命名采集目标地址
  266. - action: replace
  267. regex: ([^:]+)(?::\d+)?;(\d+)
  268. replacement: $1:$2
  269. source_labels:
  270. - __address__
  271. - __meta_kubernetes_pod_annotation_prometheus_io_port
  272. target_label: __address__
  273. # 将K8s标签(.*)作为新标签名,原有值不变
  274. - action: labelmap
  275. regex: __meta_kubernetes_pod_label_(.+)
  276. # 生成命名空间标签
  277. - action: replace
  278. source_labels:
  279. - __meta_kubernetes_namespace
  280. target_label: kubernetes_namespace
  281. # 生成Service名称标签
  282. - action: replace
  283. source_labels:
  284. - __meta_kubernetes_pod_name
  285. target_label: kubernetes_pod_name
  286. -----------------------以上是第二部分
  287. ################最后在Prometheus的配置文件里添加下面的内容
  288. [root@bogon prometheus]# pwd
  289. /usr/local/prometheus
  290. [root@bogon prometheus]# vim prometheus.yml
  291. - job_name: kubernetes-nodes-cadvisor
  292. metrics_path: /metrics
  293. scheme: https
  294. kubernetes_sd_configs:
  295. - role: node
  296. api_server: https://192.168.200.142:6443
  297. bearer_token_file: /usr/local/prometheus/token.k8s
  298. tls_config:
  299. insecure_skip_verify: true
  300. bearer_token_file: /usr/local/prometheus/token.k8s
  301. tls_config:
  302. insecure_skip_verify: true
  303. relabel_configs:
  304. - action: labelmap
  305. regex: __meta_kubernetes_node_label_(.*)
  306. - action: replace
  307. regex: (.*)
  308. source_labels: ["__address__"]
  309. target_label: __address__
  310. replacement: 192.168.200.142:6443
  311. - action: replace
  312. source_labels: [__meta_kubernetes_node_name]
  313. target_label: __metrics_path__
  314. regex: (.*)
  315. replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
  316. [root@bogon prometheus]# ./promtool check config ./prometheus.yml 检查配置文件的语法是否有错
  317. Checking ./prometheus.yml
  318. SUCCESS: 0 rule files found
  319. [root@bogon prometheus]# pgrep prome
  320. 622
  321. [root@bogon prometheus]# kill -HUP 622 ##热启动一下

监控k8s对象资源

 
  1. 需要在k8s节点上部署服务
  2. [root@k8s-master ~]# rz -E
  3. rz waiting to receive.
  4. [root@k8s-master ~]# ls
  5. anaconda-ks.cfg calico.yaml kube-state-metrics.yaml rbac.yaml token.k8s
  6. [root@k8s-master ~]# kubectl apply -f kube-state-metrics.yaml
  7. serviceaccount/kube-state-metrics created
  8. clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
  9. role.rbac.authorization.k8s.io/kube-state-metrics-resizer created
  10. clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
  11. rolebinding.rbac.authorization.k8s.io/kube-state-metrics created
  12. deployment.apps/kube-state-metrics created
  13. configmap/kube-state-metrics-config created
  14. service/kube-state-metrics created
  15. [root@k8s-master ~]# kubectl get pods -n kube-system
  16. NAME READY STATUS RESTARTS AGE
  17. calico-kube-controllers-75d555c48-vs97h 1/1 Running 0 3d10h
  18. calico-node-52lr6 1/1 Running 0 3d10h
  19. coredns-7ff77c879f-7xfmt 1/1 Running 0 3d10h
  20. coredns-7ff77c879f-mcl9f 1/1 Running 0 3d10h
  21. etcd-k8s-master 1/1 Running 0 3d10h
  22. kube-apiserver-k8s-master 1/1 Running 0 3d10h
  23. kube-controller-manager-k8s-master 1/1 Running 0 3d10h
  24. kube-proxy-xhgk5 1/1 Running 0 3d10h
  25. kube-scheduler-k8s-master 1/1 Running 0 3d10h
  26. kube-state-metrics-866f97f7fb-nzl7f 2/2 Running 0 2m7s
  27. ###最后这个是启动的服务 如果是容器创建种的状态不要急,这个正在拉取镜像

修改Prometheus的配置文件

 
  1. ###在Prometheus的服务器上
  2. [root@bogon prometheus]# vim prometheus.yml
  3. 将第二部分复制进来
  4. - job_name: kubernetes-service-endpoints
  5. kubernetes_sd_configs:
  6. - role: endpoints
  7. api_server: https://192.168.200.142:6443
  8. bearer_token_file: /usr/local/prometheus/token.k8s
  9. tls_config:
  10. insecure_skip_verify: true
  11. bearer_token_file: /usr/local/prometheus/token.k8s
  12. tls_config:
  13. insecure_skip_verify: true
  14. relabel_configs:
  15. - action: keep
  16. regex: true
  17. source_labels:
  18. - __meta_kubernetes_service_annotation_prometheus_io_scrape
  19. - action: replace
  20. regex: (https?)
  21. source_labels:
  22. - __meta_kubernetes_service_annotation_prometheus_io_scheme
  23. target_label: __scheme__
  24. - action: replace
  25. regex: (.+)
  26. source_labels:
  27. - __meta_kubernetes_service_annotation_prometheus_io_path
  28. target_label: __metrics_path__
  29. - action: replace
  30. regex: ([^:]+)(?::\d+)?;(\d+)
  31. replacement: $1:$2
  32. source_labels:
  33. - __address__
  34. - __meta_kubernetes_service_annotation_prometheus_io_port
  35. target_label: __address__
  36. - action: labelmap
  37. regex: __meta_kubernetes_service_label_(.+)
  38. - action: replace
  39. source_labels:
  40. - __meta_kubernetes_namespace
  41. target_label: kubernetes_namespace
  42. - action: replace
  43. source_labels:
  44. - __meta_kubernetes_service_name
  45. target_label: kubernetes_service_name
  46. - job_name: kubernetes-pods
  47. kubernetes_sd_configs:
  48. - role: pod
  49. api_server: https://192.168.200.142:6443
  50. bearer_token_file: /usr/local/prometheus/token.k8s
  51. tls_config:
  52. insecure_skip_verify: true
  53. bearer_token_file: /usr/local/prometheus/token.k8s
  54. tls_config:
  55. insecure_skip_verify: true
  56. relabel_configs:
  57. - action: keep
  58. regex: true
  59. source_labels:
  60. - __meta_kubernetes_pod_annotation_prometheus_io_scrape
  61. - action: replace
  62. regex: (.+)
  63. source_labels:
  64. - __meta_kubernetes_pod_annotation_prometheus_io_path
  65. target_label: __metrics_path__
  66. - action: replace
  67. regex: ([^:]+)(?::\d+)?;(\d+)
  68. replacement: $1:$2
  69. source_labels:
  70. - __address__
  71. - __meta_kubernetes_pod_annotation_prometheus_io_port
  72. target_label: __address__
  73. - action: labelmap
  74. regex: __meta_kubernetes_pod_label_(.+)
  75. - action: replace
  76. source_labels:
  77. - __meta_kubernetes_namespace
  78. target_label: kubernetes_namespace
  79. - action: replace
  80. source_labels:
  81. - __meta_kubernetes_pod_name
  82. target_label: kubernetes_pod_name
  83. [root@bogon prometheus]# ./promtool check config ./prometheus.yml
  84. Checking ./prometheus.yml
  85. SUCCESS: 0 rule files found
  86. [root@bogon prometheus]# kill -HUP 622

 
  1. 解决办法1
  2. ###在两台服务器上把ipv4转发开启,如果不开启过不了
  3. [root@bogon prometheus]# ip route add 10.244.0.0/16 via 192.168.200.142 dev ens32
  4. [root@bogon prometheus]# ip route
  5. default via 192.168.200.2 dev ens32
  6. 10.244.0.0/16 via 192.168.200.142 dev ens32
  7. 169.254.0.0/16 dev ens32 scope link metric 1002
  8. 192.168.200.0/24 dev ens32 proto kernel scope link src 192.168.200.132 ###这个问题是需要网管在路由器上解决,如果过了就ok,不过也没办法

promql使用

是Prometheus自己开发的一个类似sql'的语言,可以自己各维度的查询

 
  1. “up” 在Prometheus的上面执行个语句,可以直接查询所有实例节点的状态
  2. ###查询指标的最新样本(瞬时向量:)
  3. node_cpu_seconds_total
  4. node_cpu_serconds_total{job="你想看的job名字"}
  5. ###可以支持时间单位 s,m,h,d,w,y(加了时间以后叫范围查询)
  6. node_cpu_serconds_total{job="你想看的job名字"}[5m]
  7. node_cpu_serconds_total{job="你想看的job名字"}[1h]

常用操作符

 
  1. 比较操作符
  2. • = 等于
  3. • != 不等于
  4. • > 大于
  5. • < 小于
  6. • >= 大于等于
  7. • <= 小于等于
  8. node_cpu_seconds_total{job="Linuxweb",mode="iowait"}
  9. node_cpu_seconds_total{job="Linuxweb",mode=~"user|system"}
  10. node_cpu_seconds_total{job="Linuxweb",mode=~"user|system",cpu!="0"}
  11. 算术操作符
  12. • + 加法
  13. • - 减法
  14. • * 乘法
  15. • / 除法
  16. CPU使用率:
  17. 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100)
  18. 内存使用率:
  19. 100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) /
  20. node_memory_MemTotal_bytes * 100
  21. 正则匹配操作符
  22. • =~ 正则表达式匹配 !~ 正则表达式匹配结果取反
  23. 磁盘使用率:
  24. 100 - (node_filesystem_free_bytes{mountpoint="/",fstype=~"ext4|xfs"} /
  25. node_filesystem_size_bytes{mountpoint="/",fstype=~"ext4|xfs"} * 100)
  26. 聚合操作符
  27. • sum (在维度上求和)
  28. • max (在维度上求最大值)
  29. • min (在维度上求最小值)
  30. • avg (在维度上求平均值)
  31. • count(统计样本数量)
  32. • irate (在一定时间内的转变速率)
  33. 所有实例CPU system使用率总和:
  34. sum(node_cpu_seconds_total{job="Linuxweb",mode="system"})
  35. 所有实例CPU system变化速率平均值:
  36. avg(irate(node_cpu_seconds_total{job="Linuxweb",mode="system"}[5m])
  37. 统计CPU数量:
  38. count(node_cpu_seconds_total{job="Linuxweb",mode="system"})
  39. 逻辑操作符 • and 与 or 或
  40. 大于10并且小于50:
  41. prometheus_http_requests_total > 10 and prometheus_http_requests_total < 50
  42. 大于10或者小于50:
  43. prometheus_http_requests_total > 10 or prometheus_http_requests_total < 50

监控指标标签管理

标签的作用,是通过标签去分组,来查看不同的数据

在Prometheus所有的Target实例中,都包含一些默认的Metadata标签信息。可以通过Prometheus UI的
Targets页面中查看这些实例的Metadata标签的内容:
• address:当前Target实例的访问地址:
• scheme:采集目标服务访问地址的HTTP Scheme,HTTP或者HTTPS
• metrics_path:采集目标服务访问地址的访问路

 
  1. ##先把配置文件备份一份,然后精简下,把不需要的k8s给去掉。
  2. [root@bogon prometheus]# cp prometheus.yml{,.bak}
  3. [root@bogon prometheus]# ls
  4. console_libraries data NOTICE prometheus.yml promtool token.k8s
  5. consoles LICENSE prometheus prometheus.yml.bak sd_config
  6. ####对之前的监控进行加标签修改:
  7. [root@bogon prometheus]# vim prometheus.yml
  8. - job_name: 'docker'
  9. static_configs:
  10. - targets: ['192.168.200.131:8080']
  11. labels: ##这行表示启用标签
  12. idc: jiuxianqiao ###第一个标签叫idc标签的值是酒仙桥
  13. project: www ####第二个标签叫project标签的值是www
  14. - job_name: 'mysqldb'
  15. static_configs:
  16. - targets: ['192.168.200.131:9104']
  17. labels:
  18. idc: zhaowei
  19. project: wordpress
  20. [root@bogon prometheus]# ./promtool check config prometheus.yml
  21. Checking prometheus.yml
  22. SUCCESS: 0 rule files found
  23. [root@bogon prometheus]# kill -HUP 622

重新标记标签

重新标记目的:为了更好的标识监控指标。
在两个阶段可以重新标记:
• relabel_configs :在采集之前
• metric_relabel_configs:在存储之前
准备抓取指标数据时,可以使用relabel_configs添加一些标签、也可以只采集特定目标或过滤目标。
已经抓取到指标数据时,可以使用metric_relabel_configs做最后的重新标记和过滤。

重新标记标签一般用途:
• 动态生成新标签
• 过滤采集的Target
• 删除不需要或者敏感标签
• 添加新标

action:重新标记标签动作
• replace:默认,通过regex匹配source_label的值,使用replacement来
引用表达式匹配的分组,分组使用2...引用
• keep:删除regex与连接不匹配的目标 source_labels
• drop:删除regex与连接匹配的目标 source_labels
• labeldrop:删除regex匹配的标签
• labelkeep:删除regex不匹配的标签
• labelmap:匹配regex所有标签名称,并将捕获的内容分组,用第一个分
组内容作为新的标签

通配引用

 
  1. ###修改配置文件进行新的标签规则书写
  2. [root@bogon prometheus]# vim prometheus.yml
  3. - job_name: 'docker'
  4. static_configs:
  5. - targets: ['192.168.200.131:8080']
  6. labels:
  7. idc: jiuxianqiao
  8. project: www
  9. relabel_configs: ###表示进行标签重写
  10. - action: replace ### 动作声明
  11. source_labels: ["__address__"] ###原来是那个标签
  12. regex: (.*):([0-9]+) ####正则进行切割
  13. replacement: $1 ####反向应用那个标签
  14. target_label: "ip" #####新标签的名字叫啥
  15. [root@bogon prometheus]# ./promtool check config prometheus.yml
  16. Checking prometheus.yml
  17. SUCCESS: 0 rule files found
  18. [root@bogon prometheus]# kill -HUP 622

过滤目标采集

 
  1. ###修改配置文件
  2. [root@bogon prometheus]# vim prometheus.yml
  3. - job_name: 'linuxweb'
  4. basic_auth:
  5. username: yunjisuan
  6. password: 123456
  7. static_configs:
  8. - targets: ['192.168.200.147:9100','192.168.200.131:9100']
  9. relabel_configs:
  10. - action: drop
  11. regex: "192.168.200.131.*"
  12. source_labels: ["__address__"]
  13. [root@bogon prometheus]# ./promtool check config prometheus.yml
  14. Checking prometheus.yml
  15. SUCCESS: 0 rule files found

prometheus 报警

部署Alertmanage

 
  1. ###在Prometheusserver端
  2. [root@bogon ~]# ls ##将源码保拉进来并且解压
  3. alertmanager-0.21.0.linux-amd64.tar.gz grafana-7.3.1.linux-amd64.tar.gz prometheus.yml
  4. anaconda-ks.cfg prometheus-2.25.2.linux-amd64.tar.gz
  5. [root@bogon ~]# tar xf alertmanager-0.21.0.linux-amd64.tar.gz -C /usr/local/
  6. [root@bogon ~]# mv /usr/local/alertmanager-0.21.0.linux-amd64/ /usr/local/alertmanager
  7. [root@bogon ~]# cd /usr/local/alertmanager/
  8. [root@bogon alertmanager]# ls
  9. alertmanager alertmanager.yml amtool LICENSE NOTICE
  10. [root@bogon alertmanager]# cp /usr/lib/systemd/system/grafana.service /usr/lib/systemd/system/alertmanager.service
  11. [root@bogon alertmanager]# vim /usr/lib/systemd/system/alertmanager.service
  12. [Unit]
  13. Description=alertmanager
  14. [Service]
  15. ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
  16. ExecReload=/bin/kill -HUP $MAINPID
  17. KillMode=process
  18. Restart=on-failure
  19. [Install]
  20. WantedBy=multi-user.target
  21. [root@bogon alertmanager]# systemctl daemon-reload
  22. [root@bogon alertmanager]# systemctl start alertmanager
  23. [root@bogon alertmanager]# systemctl enable alertmanager
  24. Created symlink from /etc/systemd/system/multi-user.target.wants/alertmanager.service to /usr/lib/systemd/system/alertmanager.service.
  25. [root@bogon alertmanager]# ps -ef | grep alertmanager
  26. root 5428 1 0 21:20 ? 00:00:00 /usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
  27. root 5453 5357 0 21:20 pts/0 00:00:00 grep --color=auto alertmanager
  28. [root@bogon alertmanager]# ss -antup | grep 9093
  29. tcp LISTEN 0 128 :::9093 :::* users:(("alertmanager",pid=5428,fd=8))

 
  1. ###告诉Prometheus alert在哪了
  2. [root@bogon alertmanager]# vim /usr/local/prometheus/prometheus.yml ###修改下边的内容,其他不用变
  3. alerting:
  4. alertmanagers:
  5. - static_configs:
  6. - targets:
  7. - 127.0.0.1:9093
  8. [root@bogon alertmanager]# cd /usr/local/prometheus/
  9. [root@bogon prometheus]# ./promtool check config prometheus.yml
  10. Checking prometheus.yml
  11. SUCCESS: 0 rule files found
  12. [root@bogon prometheus]# kill -HUP 622

配置alter的配置文件

 
  1. [root@bogon prometheus]# cd /usr/local/alertmanager/
  2. [root@bogon alertmanager]# ls
  3. alertmanager alertmanager.yml amtool LICENSE NOTICE
  4. [root@bogon alertmanager]# vim alertmanager.yml
  5. ###必须是163邮箱要不报错
  6. global:
  7. smtp_smarthost: 'smtp.163.com:25'
  8. smtp_from: '18910670931@163.com'
  9. smtp_auth_username: '18910670931@163.com'
  10. smtp_auth_password: 'QVIKAAPBLVIHHWUS'
  11. smtp_require_tls: false
  12. route:
  13. group_by: ['alertname']
  14. group_wait: 30s
  15. group_interval: 5m
  16. repeat_interval: 10m
  17. receiver: default-receiver
  18. receivers:
  19. - name: 'default-receiver'
  20. email_configs:
  21. - to: '18910670931@163.com'
  22. ------------------------注释
  23. global:
  24. smtp_smarthost: 'smtp.163.com:25'
  25. smtp_from: '18910670931@163.com'
  26. smtp_auth_username: '18910670931@163.com'
  27. smtp_auth_password: 'QVIKAAPBLVIHHWUS' ###第三方授权码,在qq邮箱获取
  28. smtp_require_tls: false
  29. # 配置路由树
  30. route:
  31. group_by: ['alertname'] # 根据告警规则组名进行分组
  32. group_wait: 30s # 分组内第一个告警等待时间,10s内如有第二个告警会合并一个告警
  33. group_interval: 5m # 发送新告警间隔时间
  34. repeat_interval: 10m # 重复告警间隔发送时间
  35. receiver: default-receiver ###需要发送给那个接收人
  36. # 接收人
  37. receivers:
  38. - name: 'default-receiver'
  39. email_configs:
  40. - to: '18910670931@163.com'
  41. html: ''
  42. headers: { Subject: "[WARN] 报警邮件 test" }
  43. [root@bogon alertmanager]# ./amtool check-config alertmanager.yml
  44. Checking 'alertmanager.yml' SUCCESS
  45. Found:
  46. - global config
  47. - route
  48. - 0 inhibit rules
  49. - 1 receivers
  50. - 0 templates
  51. [root@bogon alertmanager]# systemctl restart alertmanager

在Prometheus种创建告警规则

 
  1. [root@bogon alertmanager]# cd ../prometheus/
  2. [root@bogon prometheus]# vim prometheus.yml
  3. 15 rule_files:
  4. 16 - "rules/*.yml"
  5. [root@bogon prometheus]# mkdir rules
  6. [root@bogon prometheus]# ./promtool check config prometheus.yml
  7. Checking prometheus.yml
  8. SUCCESS: 0 rule files found
  9. [root@bogon prometheus]# kill -HUP 622

 
  1. [root@bogon prometheus]# vim rules/node.yml
  2. groups:
  3. - name: alert-rules.yml
  4. rules:
  5. - alert: InstanceStatus
  6. expr: up == 0
  7. for: 10s
  8. labels:
  9. severity: "critical"
  10. annotations:
  11. description: " {{ $labels.instance }} 停止工作"
  12. summary: "{{ $labels.instance }}:job {{ $labels.job }} 已经停止20s以上."
  13. [root@bogon prometheus]# ./promtool check config prometheus.yml
  14. Checking prometheus.yml
  15. SUCCESS: 1 rule files found
  16. Checking rules/node.yml
  17. SUCCESS: 1 rules found
  18. -------------------注释
  19. groups:
  20. - name: alert-rules.yml
  21. rules:
  22. - alert: InstanceStatus # alert 名字
  23. expr: up == 0 # 判断条件
  24. for: 10s # 条件保持 10s 才会发出 alter
  25. labels: # 设置 alert 的标签
  26. severity: "critical"
  27. annotations: # alert 的其他标签,但不用于标识 alert
  28. description: " {{ $labels.instance }} 停止工作"
  29. summary: "{{ $labels.instance }}:job {{ $labels.job }} 已经停止20s以上."

告警状态

• Inactive:这里什么都没有发生。
• Pending:已触发阈值,但未满足告警持续时间
• Firing:已触发阈值且满足告警持续时间。警报发送给接受者

告警一直(由于太复杂不常用,写个示例,就不做演示)

 
  1. # 抑制规则
  2. inhibit_rules:
  3. - source_match: ###匹配来源,包含有现编的标签的
  4. severity: 'high'
  5. target_match:
  6. severity: 'warning' ###需要抑制的标签
  7. equal: ['alertname', 'dev', 'instance'] #####进行更深度的匹配

+

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值