下载
github地址
https://github.com/prometheus-operator/kube-prometheus
wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.8.0.tar.gz
tar -zxf kube-prometheus-0.8.0.tar.gz
cd kube-prometheus-0.8.0/manifests
所有配置均在manifests下
配置持久化存储
创建ceph的secrt
在 monitoring 命名空间创建pvc用于访问ceph的 secret
kubectl create secret generic ceph-user-secret --type="kubernetes.io/rbd" \
--from-literal=key=AQDlGKZgG2xRNxAA4DYniPBpaV5SAyU1/QH/5w== \
--namespace=monitoring
grafana-deployment.yaml
修改存储类型为pvc
volumes:
#- emptyDir: {}
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-data
最下方加入PersistentVolumeClaim配置
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-data
namespace: monitoring
spec:
storageClassName: dynamic-ceph-rbd
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
prometheus-prometheus.yaml
下方加入storageClass配置
serviceMonitorSelector: {}
version: 2.26.0
## 下方加入
storage:
volumeClaimTemplate:
spec:
storageClassName: dynamic-ceph-rbd
resources:
requests:
storage: 50Gi
部署kube-prometheus
kubectl apply -f manifests/setup/
kubectl apply -f manifests/
查看
kubectl get pods -n monitoring
kubectl get svc -n monitoring
kubectl get ep -n monitoring
使用ingress代理prometheus
cat > ingress-prometheus.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-prometheus
namespace: monitoring
annotations:
kubernetes.io/ingress.class: "nginx"
prometheus.io/http_probe: "true"
spec:
rules:
- host: alert.localprom.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: alertmanager-main
port:
number: 9093
- host: grafana.localprom.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana
port:
number: 3000
- host: prom.localprom.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-k8s
port:
number: 9090
EOF
kubectl apply -f ingress-prometheus.yaml
kubectl get ing -n monitoring
## 排错
kube-state-metrics 镜像下载失败
修改kube-state-metrics-deployment.yaml镜像地址
containers:
- args:
- --host=127.0.0.1
- --port=8081
- --telemetry-host=127.0.0.1
- --telemetry-port=8082
image: bitnami/kube-state-metrics:2.0.0
name: kube-state-metrics
prometheus监控ControllerManager、Scheduler没有数据
- 修改k8s集群配置文件: kube-controller-manager.conf和kube-scheduler.conf,修改后重启服务
--bind-address=0.0.0.0
- 创建kube-controller-namager和kube-scheduler的svc
cat > kube-controller-namager-svc-ep.yaml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: kube-controller-manager
namespace: kube-system
labels:
app.kubernetes.io/name: kube-controller-manager
spec:
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 10252
targetPort: 10252
protocol: TCP
- name: https-metrics
port: 10257
targetPort: 10257
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: kube-controller-manager
namespace: kube-system
labels:
app.kubernetes.io/name: kube-controller-manager
subsets:
- addresses:
- ip: 192.168.2.101
ports:
- name: http-metrics
port: 10252
protocol: TCP
- name: https-metrics
port: 10257
protocol: TCP
EOF
kubectl apply -f kube-controller-namager-svc-ep.yaml
kubectl get ep -n kube-system
kube-scheduler
cat > kube-scheduler-svc-ep.yaml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: kube-scheduler
namespace: kube-system
labels:
app.kubernetes.io/name: kube-scheduler
spec:
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 10251
targetPort: 10251
protocol: TCP
- name: https-metrics
port: 10259
targetPort: 10259
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: kube-scheduler
namespace: kube-system
labels:
app.kubernetes.io/name: kube-scheduler
subsets:
- addresses:
- ip: 192.168.2.101
ports:
- name: http-metrics
port: 10251
protocol: TCP
- name: https-metrics
port: 10259
protocol: TCP
EOF
kubectl apply -f kube-scheduler-svc-ep.yaml
kubectl get ep -n kube-system
注意labels,要与kube-prometheus中kubernetes-serviceMonitorKubeScheduler.yaml和kubernetes-serviceMonitorKubeControllerManager.yaml里面的标签对应
修改kubernetes-serviceMonitorKubeControllerManager.yaml和kubernetes-serviceMonitorKubeScheduler.yaml配置文件,修改采集方式为http方式
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: http-metrics
scheme: http
tlsConfig:
insecureSkipVerify: true
生效配置
kubectl delete -f kubernetes-serviceMonitorKubeControllerManager.yaml
kubectl apply -f kubernetes-serviceMonitorKubeControllerManager.yaml
kubectl delete -f kubernetes-serviceMonitorKubeScheduler.yaml
kubectl apply -f kubernetes-serviceMonitorKubeScheduler.yaml
CoreDNS没有数据
查看coredns的现有标签
kubectl get ep kube-dns -n kube-system -o yaml|grep -A 5 'labels'
labels:
addonmanager.kubernetes.io/mode: Reconcile
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: CoreDNS
修改kubernetes-serviceMonitorCoreDNS.yaml配置, 修改标签为coredns现有标签
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 15s
port: metrics
jobLabel: app.kubernetes.io/name
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
kubernetes.io/name: CoreDNS
生效配置
kubectl delete -f kubernetes-serviceMonitorCoreDNS.yaml
kubectl apply -f kubernetes-serviceMonitorCoreDNS.yaml