一、下载地址
Prometheus github 地址:https://github.com/coreos/kube-prometheus
二、组件说明
1.MetricServer:是kubernetes集群资源使用情况的聚合器,收集数据给kubernetes集群内使用,如kubectl,hpa,scheduler等。
2.PrometheusOperator:是一个系统监测和警报工具箱,用来存储监控数据。
3.NodeExporter:用于各node的关键度量指标状态数据。
4.KubeStateMetrics:收集kubernetes集群内资源对象数据,制定告警规则。
5.Prometheus:采用pull方式收集apiserver,scheduler,controller-manager,kubelet组件数据,通过http协议传输。
6.Grafana:是可视化数据统计和监控平台。
三、构建记录
1、同步时间
# 部署前,先同步时间,否则 prometheus 报错 No datapoints found.
ntpdate ntp1.aliyun.com
2、下载promethues
[root@k8s-master01 plugin]# mkdir promethues
[root@k8s-master01 plugin]# cd promethues/
# 这里自己下载的版本太新,部署会有问题,可以直接使用压缩包上传,解压即可。
[root@k8s-master01 promethues]# git clone https://github.com/coreos/kube-prometheus.git
# 进入目录
[root@k8s-master01 promethues]# cd kube-prometheus/manifests/
3、修改资源清单
在 kube-prometheus/manifests/ 目录下,对以下文件进行修改。将这些 service 改为 NodePort 方式。
(1)grafana-service.yaml
使用 NodePort 方式访问 grafana:
vim grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
spec:
type: NodePort #添加内容
ports:
- name: http
port: 3000
targetPort: http
nodePort: 30100 #添加内容
selector:
app: grafana
(2)prometheus-service.yaml
改为 nodepode
vim prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
prometheus: k8s
name: prometheus-k8s
namespace: monitoring
spec:
type: NodePort #添加内容
ports:
- name: web
port: 9090
targetPort: web
nodePort: 30200 #添加内容
selector:
app: prometheus
prometheus: k8s
(3)alertmanager-service.yaml
改为 nodepode
vim alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
alertmanager: main
name: alertmanager-main
namespace: monitoring
spec:
type: NodePort #添加内容
ports:
- name: web
port: 9093
targetPort: web
nodePort: 30300 #添加内容
selector:
alertmanager: main
app: alertmanager
4、导入镜像
先将镜像文件上传到所有的节点,然后把镜像导入到所有节点,包括master节点,master节点需要部署 node-exporter 。
# 切换到镜像所在目录
[root@k8s-node01 ~]# cd Images/prometheus-operator/
# 查看目录下的文件
# kube-prometheus.git.tar.gz 是部署的资源清单
# prometheus.tar.gz 是镜像文件的压缩包
# load-images.sh 是导入镜像的脚本
[root@k8s-node01 prometheus-operator]# ls
kube-prometheus.git.tar.gz load-images.sh prometheus.tar.gz
# 解压镜像包
[root@k8s-node01 prometheus-operator]# tar -zxvf prometheus.tar.gz
# 查看解压后的文件
[root@k8s-node01 prometheus-operator]# ls
kube-prometheus.git.tar.gz load-images.sh prometheus prometheus.tar.gz
# 查看镜像所在目录
[root@k8s-node01 prometheus-operator]# pwd
/root/Images/prometheus-operator
# 修改脚本中的镜像所在目录
[root@k8s-node01 prometheus-operator]# vim load-images.sh
# 给脚本赋予执行权力
[root@k8s-node01 prometheus-operator]# chmod a+x load-images.sh
# 导入镜像
[root@k8s-node01 prometheus-operator]# ./load-images.sh
查看导入反馈信息 cd05ae2f58b4: Loading layer [==================================================>] 37.2MB/37.2MB Loaded image: k8s.gcr.io/addon-resizer:1.8.4 a724badf61ce: Loading layer [==================================================>] 1.425MB/1.425MB a135773ab4c8: Loading layer [==================================================>] 2.627MB/2.627MB 1660e4ef8e72: Loading layer [==================================================>] 22.31MB/22.31MB d8a26634229b: Loading layer [==================================================>] 26.9MB/26.9MB d56a80b83a4c: Loading layer [==================================================>] 3.072kB/3.072kB 96d25d0e9121: Loading layer [==================================================>] 3.584kB/3.584kB Loaded image: quay.io/prometheus/alertmanager:v0.18.0 91bd48b9e0b0: Loading layer [==================================================>] 4.787MB/4.787MB 5f70bf18a086: Loading layer [==================================================>] 1.024kB/1.024kB Loaded image: quay.io/coreos/configmap-reload:v0.0.1 6270adb5794c: Loading layer [==================================================>] 58.45MB/58.45MB 9871c21d3bdf: Loading layer [==================================================>] 3.072kB/3.072kB 3e36146153a9: Loading layer [==================================================>] 24.75MB/24.75MB 4155cf44b11c: Loading layer [==================================================>] 170.5MB/170.5MB 9ed438ff1909: Loading layer [==================================================>] 196.6kB/196.6kB 0cd0d98c3ece: Loading layer [==================================================>] 5.12kB/5.12kB Loaded image: grafana/grafana:6.2.2 7c3fbc1a45e2: Loading layer [==================================================>] 59.59MB/59.59MB Loaded image: quay.io/coreos/k8s-prometheus-adapter-amd64:v0.4.1 e9f0f02bc156: Loading layer [==================================================>] 840.2kB/840.2kB 2ad89e029676: Loading layer [==================================================>] 36.35MB/36.35MB Loaded image: quay.io/coreos/kube-rbac-proxy:v0.4.1 01092e5921c5: Loading layer [==================================================>] 3.062MB/3.062MB 6dc904f7f044: Loading layer [==================================================>] 31.31MB/31.31MB f83fc93ec17d: Loading layer [==================================================>] 3.584kB/3.584kB Loaded image: quay.io/coreos/kube-state-metrics:v1.7.1 975e03895fb7: Loading layer [==================================================>] 4.688MB/4.688MB f9fe8137e4e3: Loading layer [==================================================>] 2.765MB/2.765MB 78f40987f0cd: Loading layer [==================================================>] 16.88MB/16.88MB Loaded image: quay.io/prometheus/node-exporter:v0.18.1 5effb4064a9c: Loading layer [==================================================>] 7.477MB/7.477MB 0ccc317478d9: Loading layer [==================================================>] 7.477MB/7.477MB Loaded image: quay.io/coreos/prometheus-config-reloader:v0.31.1 f02e8132e055: Loading layer [==================================================>] 37.83MB/37.83MB Loaded image: quay.io/coreos/prometheus-operator:v0.31.1 5858aa1caa48: Loading layer [==================================================>] 76.32MB/76.32MB 495a19a962c5: Loading layer [==================================================>] 46.67MB/46.67MB 483b2ba761c7: Loading layer [==================================================>] 3.584kB/3.584kB b1f92c6d4068: Loading layer [==================================================>] 13.31kB/13.31kB 97c86534d863: Loading layer [==================================================>] 28.16kB/28.16kB d2cacb77d93d: Loading layer [==================================================>] 3.072kB/3.072kB 799a04338fc1: Loading layer [==================================================>] 5.12kB/5.12kB Loaded image: quay.io/prometheus/prometheus:v2.11.0
5、部署
如果第一遍不成功,可以多部署几遍,因为他们之间有依赖关系。
[root@k8s-master01 manifests]# kubectl apply -f ../manifests/
查看部署的反馈信息 namespace/monitoring created customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created clusterrole.rbac.authorization.k8s.io/prometheus-operator created clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created deployment.apps/prometheus-operator created service/prometheus-operator created serviceaccount/prometheus-operator created servicemonitor.monitoring.coreos.com/prometheus-operator created alertmanager.monitoring.coreos.com/main created secret/alertmanager-main created service/alertmanager-main created serviceaccount/alertmanager-main created servicemonitor.monitoring.coreos.com/alertmanager created secret/grafana-datasources created configmap/grafana-dashboard-apiserver created configmap/grafana-dashboard-controller-manager created configmap/grafana-dashboard-k8s-cluster-rsrc-use created configmap/grafana-dashboard-k8s-node-rsrc-use created configmap/grafana-dashboard-k8s-resources-cluster created configmap/grafana-dashboard-k8s-resources-namespace created configmap/grafana-dashboard-k8s-resources-pod created configmap/grafana-dashboard-k8s-resources-workload created configmap/grafana-dashboard-k8s-resources-workloads-namespace created configmap/grafana-dashboard-kubelet created configmap/grafana-dashboard-nodes created configmap/grafana-dashboard-persistentvolumesusage created configmap/grafana-dashboard-pods created configmap/grafana-dashboard-prometheus-remote-write created configmap/grafana-dashboard-prometheus created configmap/grafana-dashboard-proxy created configmap/grafana-dashboard-scheduler created configmap/grafana-dashboard-statefulset created configmap/grafana-dashboards created deployment.apps/grafana created service/grafana created serviceaccount/grafana created servicemonitor.monitoring.coreos.com/grafana created clusterrole.rbac.authorization.k8s.io/kube-state-metrics created clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created deployment.apps/kube-state-metrics created role.rbac.authorization.k8s.io/kube-state-metrics created rolebinding.rbac.authorization.k8s.io/kube-state-metrics created service/kube-state-metrics created serviceaccount/kube-state-metrics created servicemonitor.monitoring.coreos.com/kube-state-metrics created clusterrole.rbac.authorization.k8s.io/node-exporter created clusterrolebinding.rbac.authorization.k8s.io/node-exporter created daemonset.apps/node-exporter created service/node-exporter created serviceaccount/node-exporter created servicemonitor.monitoring.coreos.com/node-exporter created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created clusterrole.rbac.authorization.k8s.io/prometheus-adapter created clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created configmap/adapter-config created deployment.apps/prometheus-adapter created rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created service/prometheus-adapter created serviceaccount/prometheus-adapter created clusterrole.rbac.authorization.k8s.io/prometheus-k8s created clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created prometheus.monitoring.coreos.com/k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s-config created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created service/prometheus-k8s created serviceaccount/prometheus-k8s created servicemonitor.monitoring.coreos.com/prometheus created servicemonitor.monitoring.coreos.com/kube-apiserver created servicemonitor.monitoring.coreos.com/coredns created servicemonitor.monitoring.coreos.com/kube-controller-manager created servicemonitor.monitoring.coreos.com/kube-scheduler created servicemonitor.monitoring.coreos.com/kubelet created
# 查看已部署 pod
[root@k8s-master01 promethues]# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 7m28s
alertmanager-main-1 2/2 Running 0 7m21s
alertmanager-main-2 2/2 Running 0 7m13s
grafana-7dc5f8f9f6-v88pc 1/1 Running 0 7m32s
kube-state-metrics-5cbd67455c-9wvdz 4/4 Running 0 7m29s
node-exporter-6qvp6 2/2 Running 0 7m31s
node-exporter-g5mls 2/2 Running 0 11s
node-exporter-xvqvg 2/2 Running 0 7m31s
prometheus-adapter-668748ddbd-xmwfn 1/1 Running 0 7m31s
prometheus-k8s-0 3/3 Running 1 7m18s
prometheus-k8s-1 3/3 Running 1 7m18s
prometheus-operator-7447bf4dcb-247bv 1/1 Running 0 7m32s
# 查看资源
[root@k8s-master01 promethues]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-master01 116m 2% 1428Mi 37%
k8s-node01 86m 2% 1384Mi 36%
k8s-node02 80m 2% 1175Mi 30%
# 查看 service
[root@k8s-master01 promethues]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main NodePort 10.97.21.171 <none> 9093:30300/TCP 10m
alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 10m
grafana NodePort 10.96.134.150 <none> 3000:30100/TCP 10m
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 10m
node-exporter ClusterIP None <none> 9100/TCP 10m
prometheus-adapter ClusterIP 10.107.217.61 <none> 443/TCP 10m
prometheus-k8s NodePort 10.105.221.173 <none> 9090:30200/TCP 10m
prometheus-operated ClusterIP None <none> 9090/TCP 9m59s
prometheus-operator ClusterIP None <none> 8080/TCP 10m
四、访问 prometheus
prometheus 对应的 nodeport 端口为 30200,访问 http://MasterIP:30200 。
通过访问 http://MasterIP:30200/target 可以看到 prometheus 已经成功连接上了 k8s 的 apiserver
查看 service-discovery
Prometheus 自己的指标
prometheus 的 WEB 界面上提供了基本的查询 K8S 集群中每个 POD 的 CPU 使用情况,查询条件如下:
sum by (pod_name)( rate(container_cpu_usage_seconds_total{image!="", pod_name!=""}[1m] ) )
上述的查询有出现数据,说明 node-exporter 往 prometheus 中写入数据正常,接下来我们就可以部署 grafana 组件,实现更友好的 webui 展示数据了。
五、访问 grafana
查看 grafana 服务暴露的端口号:
[root@k8s-master01 ~]# kubectl get service -n monitoring | grep grafana
grafana NodePort 10.102.31.42 <none> 3000:30100/TCP 2d14h
如上可以看到 grafana 的端口号是 30100,浏览器访问 http://MasterIP:30100 用户名密码默认 admin/admin
修改密码并登陆
添加数据源 grafana 默认已经添加了 Prometheus 数据源,grafana 支持多种时序数据源,每种数据源都有各自的查询编辑器。
点击test,查看数据源是否可以用。
这里进行导入仪表盘。
查看仪表盘。
六、Horizontal Pod Autoscaling
Horizontal Pod Autoscaling
可以根据 CPU
利用率自动伸缩一个 Replication Controller
、Deployment
或者 Replica Set
中的 Pod
数量。
创建 HPA 控制器 - 相关算法的详情请参阅这篇文档
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/horizontal-pod-autoscaler.md#autoscaling-algorithm
1、创建hpa实例
# 首先在各个节点导入 hpa-example 镜像。
[root@k8s-node01 metrics]# docker load -i hpa-example.tar
# 部署一个 deployment,限制pod最大cpu占用为200m(m单位指毫核,1000m为一核)
kubectl run php-apache --image=gcr.io/google_containers/hpa-example --requests=cpu=200m --expose --port=80
# 查看 php-apache 的 pod,这里显示拉取镜像失败
[root@k8s-master01 promethues]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
php-apache-69dd84889f-8r8pf 0/1 ImagePullBackOff 0 37s 10.244.2.198 k8s-node02 <none> <none>
# 因为已经导入镜像文件,需要修改下载镜像的策略 imagePullPolicy: IfNotPresent
[root@k8s-master01 promethues]# kubectl edit deployment php-apache
# 再次查看 php-apache 的 pod,已经执行成功
[root@k8s-master01 promethues]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
php-apache-799f99c985-c2nfp 1/1 Running 0 15s 10.244.1.221 k8s-node01 <none> <none>
# 查看 pod 的资源占用
[root@k8s-master01 promethues]# kubectl top pod php-apache-799f99c985-c2nfp
NAME CPU(cores) MEMORY(bytes)
php-apache-799f99c985-c2nfp 0m 10Mi
2、创建一个 hpa 控制器
# 当cpu占用超过50%时,就进行扩充pod,最多可以扩充10个
[root@k8s-master01 promethues]# kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
# 这里要获取到资源,也就是 TARGETS 显示 0%/50%
[root@k8s-master01 promethues]# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 10 1 118s
3、 增加负载,查看负载节点数目
# 先创建一个busbox的容器,并进入容器中
[root@k8s-master01 ~]# kubectl run -i --tty load-generator --image=busybox /bin/sh
# 在容器中执行请求
[root@k8s-master01 ~]# while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done
# 查看hpa的负载不断升高
[root@k8s-master01 ~]# kubectl get hpa -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 53%/50% 1 10 9 34m
php-apache Deployment/php-apache 99%/50% 1 10 9 35m
php-apache Deployment/php-apache 87%/50% 1 10 9 35m
php-apache Deployment/php-apache 105%/50% 1 10 10 35m
php-apache Deployment/php-apache 104%/50% 1 10 10 36m
# 查看pod个数不断在增加,直至增加到10个。
[root@k8s-master01 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
load-generator-2-6d965f5998-shdkx 1/1 Running 0 2m11s
load-generator-7d549cd44-bl8qr 1/1 Running 0 32m
php-apache-799f99c985-5l2d9 1/1 Running 0 28m
php-apache-799f99c985-c2nfp 1/1 Running 0 48m
php-apache-799f99c985-c92tz 1/1 Running 0 30m
php-apache-799f99c985-fvp9t 1/1 Running 0 30m
php-apache-799f99c985-jfrjf 1/1 Running 0 30m
php-apache-799f99c985-mr6nj 1/1 Running 0 31m
php-apache-799f99c985-p9rb9 1/1 Running 0 54s
php-apache-799f99c985-qzqc8 1/1 Running 0 31m
php-apache-799f99c985-x5rsr 1/1 Running 0 30m
php-apache-799f99c985-x94w2 1/1 Running 0 31m
七、资源限制 - Pod
Kubernetes 对资源的限制实际上是通过 cgroup 来控制的,cgroup 是容器的一组用来控制内核如何运行进程的相关属性集合。针对内存、CPU 和各种设备都有对应的 cgroup。
默认情况下,Pod 运行没有 CPU 和内存的限额。 这意味着系统中的任何 Pod 将能够像执行该 Pod 所在的节点一样,消耗足够多的 CPU 和内存 。一般会针对某些应用的 pod 资源进行资源限制,这个资源限制是通过 resources 的 requests 和 limits 来实现。
spec:
containers:
- image: xxxx
imagePullPolicy: Always
name: auth
ports:
- containerPort: 8080
protocol: TCP
resources:
limits:
cpu: "4"
memory: 2Gi
requests:
cpu: 250m
memory: 250Mi
requests 要分分配的资源,limits 为最高请求的资源值。可以简单理解为初始值和最大值
八、资源限制 - 名称空间
1、计算资源配额
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-resources
namespace: spark-cluster
spec:
hard:
pods: "20"
requests.cpu: "20"
requests.memory: 100Gi
limits.cpu: "40"
limits.memory: 200Gi
2、配置对象数量配额限制
apiVersion: v1
kind: ResourceQuota
metadata:
name: object-counts
namespace: spark-cluster
spec:
hard:
configmaps: "10"
persistentvolumeclaims: "4"
replicationcontrollers: "20"
secrets: "10"
services: "10"
services.loadbalancers: "2"
九、配置 CPU 和 内存 LimitRange
default 即 limit 的值
defaultRequest 即 request 的值
apiVersion: v1
kind: LimitRange
metadata:
name: mem-limit-range
spec:
limits:
- default:
memory: 50Gi
cpu: 5
defaultRequest:
memory: 1Gi
cpu: 1
type: Container