“Prometheus Operator[1]提供原生的部署和管理Prometheus Server,以及其他监控组件。通过定义ServiceMonitor、Alertmanager资源简化了Prometheus的监控项和告警规则的配置管理。
”
架构
通过定义ServiceMonitor资源定义需要监控的endpoints,Operator生成Prometheus配置文件中对应的Job。配置生成后,通过prometheus-config-reloader重载配置。
Prometheus Operator在Kubernetes中定义了以下几种资源:
- Operator
- 在Kubernetes中以Deployment对象运行,来部署和管理Prometheus Server,依据ServiceMonitor资源定义动态更新Prometheus Server的监控对象
- 如
- Prometheus
- 定义使用statefulsets资源对象部署Prometheus Server的配置(如storage、resources、secrets等),匹配对应ServiceMonitor资源
- Alertmanager
- 定义部署Alertmanager的配置
- ServiceMonitor
- 定义监控目标,通过匹配Service资源对应的endpoints
- Service
- 定义要监控的对象,每个监控对象都有一个对应的Service
部署
使用helm安装Prometheus Operator,如:
git clone https://github.com/helm/charts.git
cd charts/stable/prometheus-operator
helm dependency update
# 可以在安装前通过修改values.yaml
# 配置文件来自定义配置项
kubectl create namespace monitor
helm install --name prometheus-operator \
--namespace=monitor ./
如使用云服务托管的Kubernetes,可以在应用目录中搜索prometheus-operator
进行安装,如某云的开源Prometheus监控[2]
新增监控对象
如使用Prometheus获取Java应用的Metrics(使用Actuator、Micrometer暴露性能数据)。
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: java_app_metrics
namespace: monitoring
spec:
endpoints:
- interval: 30s
path: /metrics
port: http-metrics
relabelings:
- sourceLabels:
- __meta_kubernetes_pod_host_ip
targetLabel: node_ip
- sourceLabels:
- __meta_kubernetes_service_label_application
targetLabel: application
- replacement: java_app_metrics
targetLabel: job
jobLabel: java_app_metrics
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
# 匹配的service lables定义
type: java_app_metrics
监控ETCD
因ETCD需要使用证书才能获取数据,将证书挂载到Prometheus Server的Pod中。
创建secret
kubectl create secret generic \
ack-prometheus-operator-etcd \
--from-file=/etc/kubernetes/pki/etcd/ca.pem \
--from-file=/etc/kubernetes/pki/etcd/etcd-client.pem \
--from-file=/etc/kubernetes/pki/etcd/etcd-client-key.pem \
-n monitoring
修改Prometheus对象挂载Secrets
# kubectl edit prometheus -n monitoring ack-prometheus-operator-prometheus
...
secrets:
- ack-prometheus-operator-etcd
...
修改ETCD ServiceMonitor指定挂载到Pod中的key
使用命令kubectl edit servicemonitors.monitoring.coreos.com -n monitoring ack-prometheus-operator-kube-etcd
修改etcd的ServiceMonitor资源对象。
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: ack-prometheus-operator-kube-etcd
chart: ack-prometheus-operator-11.0.0
heritage: Helm
release: ack-prometheus-operator
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
port: http-metrics
scheme: https
tlsConfig:
caFile: /etc/prometheus/secrets/ack-prometheus-operator-etcd/ca.pem
certFile: /etc/prometheus/secrets/ack-prometheus-operator-etcd/etcd-client.pem
insecureSkipVerify: true
keyFile: /etc/prometheus/secrets/ack-prometheus-operator-etcd/etcd-client-key.pem
jobLabel: jobLabel
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app: ack-prometheus-operator-kube-etcd
release: ack-prometheus-operator
定义Prometheus Server的resources
使用命令kubectl edit prometheus -n monitoring ack-prometheus-operator-prometheus
修改Prometheus对象。
spec:
...
resources:
limits:
cpu: "2"
memory: 4Gi
requests:
cpu: 200m
memory: 1Gi
...
定义Prometheus Server的存储[3]
默认Prometheus Server的数据存储在emptyDir,容器重启后数据将丢失,如使用云服务,可以将数据保存到NAS等共享存储。
定义StorageClass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: prometheus-nas
mountOptions:
- nolock,tcp,noresvport
- vers=3
parameters:
volumeAs: subpath
server: "xxxxxxx.cn-hangzhou.nas.aliyuncs.com:/prometheus"
provisioner: nasplugin.csi.alibabacloud.com
reclaimPolicy: Retain
修改Prometheus对象(kubectl edit prometheus -n monitoring ack-prometheus-operator-prometheus
),定义storage:
spec:
storage:
volumeClaimTemplate:
spec:
storageClassName: prometheus-nas
resources:
requests:
storage: 100Gi
retention: 30d
...
其他问题
使用某云的托管Kubernetes集群,会有两个kubelet服务,会存在Pod的监控数据重复问题,需要删除一个服务,kubelet delete service kubelet -n kube-system
。
参考资料
[1]Prometheus Operator: https://github.com/prometheus-operator/prometheus-operator
[2]开源Prometheus监控: https://help.aliyun.com/document_detail/94622.html?spm=a2c4g.11174283.6.881.63122ceef6QQ2S
[3]Prometheus Storage: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/user-guides/storage.md