1.1 Heapster
Heapster 是容器集群监控和性能分析工具,天然的支持Kubernetes 和 CoreOS。
Kubernetes 有个出名的监控 agent—cAdvisor。在每个kubernetes Node 上都会运行 cAdvisor,它会收集本机以及容器的监控数据(cpu,memory,filesystem,network,uptime)。
在较新的版本中,K8S 已经将 cAdvisor 功能集成到 kubelet 组件中。每个 Node 节点可以直接进行 web 访问。
1.2 Weave Scope
Weave Scope 可以监控 kubernetes 集群中的一系列资源的状态、资源使用情况、应用拓扑、scale、还可以直接通过浏览器进入容器内部调试等,其提供的功能包括:
- 交互式拓扑界面
- 图形模式和表格模式
- 过滤功能
- 搜索功能
- 实时度量
- 容器排错
- 插件扩展
1.3 Prometheus
Prometheus 是一套开源的监控系统、报警、时间序列的集合,最初由 SoundCloud 开发,后来随着越来越多公司的使用,于是便独立成开源项目。自此以后,许多公司和组织都采用了 Prometheus 作为监控告警工具。
2 Prometheus 监控 k8s
2.1 自定义配置
2.1.1 创建 ConfigMap 配置
创建 prometheus-config.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
data:
prometheus.yml: |
global:
scrape\_interval: 15s
evaluation\_interval: 15s
scrape\_configs:
- job\_name: 'prometheus'
static\_configs:
- targets: ['localhost:9090']
创建 configmap
kubectl create -f prometheus-config.yml
2.1.2 部署 Prometheus
创建 prometheus-deploy.yml
apiVersion: v1
kind: Service
metadata:
name: prometheus
labels:
name: prometheus
spec:
ports:
- name: prometheus
protocol: TCP
port: 9090
targetPort: 9090
selector:
app: prometheus
type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
name: prometheus
name: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus:v2.2.1
command:
- "/bin/prometheus"
args:
- "--config.file=/etc/prometheus/prometheus.yml"
ports:
- containerPort: 9090
protocol: TCP
volumeMounts:
- mountPath: "/etc/prometheus"
name: prometheus-config
volumes:
- name: prometheus-config
configMap:
name: prometheus-config
创建部署对象
kubectl create -f prometheus-deploy.yml
查看是否在运行中
kubectl get pods -l app=prometheus
获取服务信息
kubectl get svc -l name=prometheus
通过 http://节点ip:端口 进行访问
2.1.3 配置访问权限
创建 prometheus-rbac-setup.yml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
创建资源对象
kubectl create -f prometheus-rbac-setup.yml
修改 prometheus-deploy.yml 配置文件
spec:
replicas: 1
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus
serviceAccount: prometheus
升级 prometheus-deployment
kubectl apply -f prometheus-deployment.yml
查看 pod
kubectl get pods -l app=prometheus
查看 serviceaccount 认证证书
kubectl exec -it <pod name> -- ls /var/run/secrets/kubernetes.io/serviceaccount/
2.1.4 服务发现配置
# 配置 job,帮助 prometheus 找到所有节点信息,修改 prometheus-config.yml 增加为如下内容
data:
prometheus.yml: |
global:
scrape\_interval: 15s
evaluation\_interval: 15s
scrape\_configs:
- job\_name: 'prometheus'
static\_configs:
- targets: ['localhost:9090']
- job\_name: 'kubernetes-nodes'
tls\_config:
ca\_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer\_token\_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes\_sd\_configs:
- role: node
- job\_name: 'kubernetes-service'
tls\_config:
ca\_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer\_token\_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes\_sd\_configs:
- role: service
- job\_name: 'kubernetes-endpoints'
tls\_config:
ca\_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer\_token\_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes\_sd\_configs:
- role: endpoints
- job\_name: 'kubernetes-ingress'
tls\_config:
ca\_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer\_token\_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes\_sd\_configs:
- role: ingress
- job\_name: 'kubernetes-pods'
tls\_config:
ca\_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer\_token\_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes\_sd\_configs:
- role: pod
升级配置
kubectl apply -f prometheus-config.yml
获取 prometheus pod
kubectl get pods -l app=prometheus
删除 pod
kubectl delete pods <pod name>
查看 pod 状态
kubectl get pods
重新访问 ui 界面
2.1.5 系统时间同步
查看系统时间
date
同步网络时间
ntpdate cn.pool.ntp.org
2.1.6 监控 k8s 集群
# 往 prometheus-config.yml 中追加如下配置
- job\_name: 'kubernetes-kubelet'
scheme: https
tls\_config:
ca\_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer\_token\_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes\_sd\_configs:
- role: node
relabel\_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target\_label: __address__
replacement: kubernetes.default.svc:443
- source\_labels: [__meta_kubernetes_node_name]
regex: (.+)
target\_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
升级资源
kubectl apply -f prometheus-config.yml
重新构建应用
kubectl delete pods <pod name>
利用指标获取当前节点中 pod 的启动时间
kubelet_pod_start_latency_microseconds{quantile="0.99"}
计算平均时间
kubelet_pod_start_latency_microseconds_sum / kubelet_pod_start_latency_microseconds_count
2.1.6.1 从 kubelet 获取节点容器资源使用情况
# 修改配置文件,增加如下内容,并更新服务
- job\_name: 'kubernetes-cadvisor'
scheme: https
tls\_config:
ca\_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer\_token\_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes\_sd\_configs:
- role: node
relabel\_configs:
- target\_label: __address__
replacement: kubernetes.default.svc:443
- source\_labels: [__meta_kubernetes_node_name]
regex: (.+)
target\_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
2.1.6.2 Exporter 监控资源使用情况
# 创建 node-exporter-daemonset.yml 文件
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
spec:
template:
metadata:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9100'
prometheus.io/path: 'metrics'
labels:
app: node-exporter
name: node-exporter
spec:
containers:
- image: prom/node-exporter
imagePullPolicy: IfNotPresent
name: node-exporter
ports:
- containerPort: 9100
hostPort: 9100
name: scrape
hostNetwork: true
hostPID: true
创建 daemonset
kubectl create -f node-exporter-daemonset.yml
查看 daemonset 运行状态
kubectl get daemonsets -l app=node-exporter
查看 pod 状态
kubectl get pods -l app=node-exporter
# 修改配置文件,增加监控采集任务
- job\_name: 'kubernetes-pods'
kubernetes\_sd\_configs:
- role: pod
relabel\_configs:
- source\_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source\_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target\_label: __metrics_path__
regex: (.+)
- source\_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target\_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source\_labels: [__meta_kubernetes_namespace]
action: replace
target\_label: kubernetes_namespace
- source\_labels: [__meta_kubernetes_pod_name]
action: replace
target\_label: kubernetes_pod_name
# 通过监控 apiserver 来监控所有对应的入口请求,增加 api-server 监控配置
- job\_name: 'kubernetes-apiservers'
kubernetes\_sd\_configs:
- role: endpoints
scheme: https
tls\_config:
ca\_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer\_token\_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel\_configs:
- source\_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- target\_label: __address__
replacement: kubernetes.default.svc:443
2.1.6.3 对 Ingress 和 Service 进行网络探测
# 创建 blackbox-exporter.yaml 进行网络探测
apiVersion: v1
kind: Service
metadata:
labels:
app: blackbox-exporter
name: blackbox-exporter
spec:
ports:
- name: blackbox
port: 9115
protocol: TCP
selector:
app: blackbox-exporter
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: blackbox-exporter
name: blackbox-exporter
spec:
replicas: 1
selector:
matchLabels:
app: blackbox-exporter
template:
metadata:
labels:
app: blackbox-exporter
spec:
containers:
- image: prom/blackbox-exporter
imagePullPolicy: IfNotPresent
name: blackbox-exporter
创建资源对象
kubectl -f blackbox-exporter.yaml
# 配置监控采集所有 service/ingress 信息,加入配置到配置文件
- job\_name: 'kubernetes-services'
metrics\_path: /probe
params:
module: [http_2xx]
kubernetes\_sd\_configs:
- role: service
relabel\_configs:
- source\_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source\_labels: [__address__]
target\_label: __param_target
- target\_label: __address__
replacement: blackbox-exporter.default.svc.cluster.local:9115
- source\_labels: [__param_target]
target\_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source\_labels: [__meta_kubernetes_namespace]
target\_label: kubernetes_namespace
- source\_labels: [__meta_kubernetes_service_name]
target\_label: kubernetes_name
- job\_name: 'kubernetes-ingresses'
metrics\_path: /probe
params:
module: [http_2xx]
kubernetes\_sd\_configs:
- role: ingress
relabel\_configs:
- source\_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
action: keep
regex: true
- source\_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
regex: (.+);(.+);(.+)
replacement: ${1}://${2}${3}
target\_label: __param_target
- target\_label: __address__
replacement: blackbox-exporter.default.svc.cluster.local:9115
- source\_labels: [__param_target]
target\_label: instance
- action: labelmap
regex: __meta_kubernetes_ingress_label_(.+)
- source\_labels: [__meta_kubernetes_namespace]
target\_label: kubernetes_namespace
- source\_labels: [__meta_kubernetes_ingress_name]
target\_label: kubernetes_name
2.1.7 Grafana 可视化
Grafana 是一个通用的可视化工具。‘通用’意味着 Grafana 不仅仅适用于展示 Prometheus 下的监控数据,也同样适用于一些其他的数据可视化需求。在开始使用Grafana之前,我们首先需要明确一些 Grafana下的基本概念,以帮助用户能够快速理解Grafana。