简介
Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Since its inception in 2012, many companies and organizations have adopted Prometheus, and the project has a very active developer and user community. It is now a standalone open source project and maintained independently of any company. To emphasize this, and to clarify the project’s governance structure, Prometheus joined the Cloud Native Computing Foundation in 2016 as the second hosted project, after Kubernetes.
Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.
For more elaborate overviews of Prometheus, see the resources linked from the media section.
特征
Prometheus’s main features are:
- a multi-dimensional data model with time series data identified by metric name and key/value pairs
- PromQL, a flexible query language to leverage this dimensionality
- no reliance on distributed storage; single server nodes are autonomous
- time series collection happens via a pull model over HTTP
- pushing time series is supported via an intermediary gateway
- targets are discovered via service discovery or static configuration
- multiple modes of graphing and dashboarding support
组件
The Prometheus ecosystem consists of multiple components, many of which are optional:
- the main Prometheus server which scrapes and stores time series data
- client libraries for instrumenting application code a push gateway
- for supporting short-lived jobs special-purpose exporters for
- services like HAProxy, StatsD, Graphite, etc. an alertmanager to
- handle alerts various support tools
架构
安装方式选择
二进制安装
Prometheus主要是由Go语言编写的,可以在官网下载 https://prometheus.io/download/ 二进制文件,直接进行启动安装
./prometheus --config.file=prometheus.yml
prometheus.yml文件的基本配置如下:
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
通过docker方式安装
登陆docker hub,查找Prometheus
查找一个docker镜像即可,比如
https://hub.docker.com/layers/bitnami/prometheus/2-debian-10/images/sha256-ad4ad5965bc993979299fa366b408bd07366404f3a1d3915dc6e3eab44c42a64?context=explore
这里关注的是Prometheus的镜像地址
以及镜像启动的命令
其目的是填写k8s里container配置文件参数
编写Prometheus k8s yaml文件
deployment
apiVersion: apps/v1
kind: Deployment # 这里决定使用deployment来部署,所以需要考虑到pod被delete后,后端储存还能用,因此使用了pv
metadata:
name: prometheus-deploy
namespace: prometheus-ns # 单独使用了命名空间,所以还需要有namespace的声明
spec:
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
securityContext:
runAsUser: 0 # 由于prometheus的docker启动user id是1001,在访问nfs里会权限不够,这里使用root用户来运行
serviceAccountName: prometheus-sa # prometheus需要访问k8s里的相关信息,因此需要账号控制策略
containers:
- name: prometheus-container
image: docker.io/bitnami/prometheus:2-debian-10
imagePullPolicy: IfNotPresent
args:
- "--config.file=/prometheus/conf/prometheus.yml" # 通过configmap资源对象储存
- "--web.console.libraries=/opt/bitnami/prometheus/conf/console_libraries" # 暂不修改,使用docker镜像里的配置
- "--web.console.templates=/opt/bitnami/prometheus/conf/consoles" # 暂不修改,使用docker镜像里的配置
- "--storage.tsdb.path=/prometheus/data/" # 通过声明pvc来储存,来做持久化
- "--storage.tsdb.retention=24h" # 保留多长时间的时序日志
- "--web.enable-admin-api" # 可以开启对admin api来访问,直接操作
- "--web.enable-lifecycle" # 表示开启热更新
resources:
limits:
memory: "128Mi"
cpu: "500m"
ports:
- containerPort: 9090
name: app-http-port
volumeMounts:
- mountPath: "/prometheus/data/" # 目录持久化
subPath: sub1
name: data
- mountPath: "/prometheus/conf/"
name: config
volumes:
- name: data
persistentVolumeClaim:
claimName: prometheus-pvc
- name: config
configMap:
name: prometheus-cm # 通过configmap来储存prometheus.yml文件
Namespace
apiVersion: v1
kind: Namespace
metadata:
name: prometheus-ns
PersistentVolume
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-pv # 来给pvc干活用
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
nfs:
path: /nfsData/prometheus
server: 192.168.56.203 # 部署的nfs储存服务
PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-pvc # 名字和deployment中使用的保持一致
namespace: prometheus-ns
spec: # 字段和pv中保持一致
resources:
requests:
storage: 10Gi # 不能大于pv中的值
accessModes:
- ReadWriteOnce
ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-cm # 通过configmap来储存prometheus.yml文件
namespace: prometheus-ns
data:
prometheus.yml: |
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-sa
namespace: prometheus-ns
ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-cr # 需要访问到其它namespace下的内容
rules: # 根据使用的情况,进行适度修改
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
- configmaps
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"] # 对非资源型进行操作
verbs: ["get"]
ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-crb # 将账号和集群绑定
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus-cr
subjects:
- kind: ServiceAccount
name: prometheus-sa
namespace: prometheus-ns
Service
apiVersion: v1
kind: Service # 也可以配置Ingress来进行访问
metadata:
name: prometheus-svc
namespace: prometheus-ns # 保持在同一个名称空间内,不然会出现无法访问的现象
spec:
selector:
app: prometheus
type: NodePort
ports:
- name: web
port: 9090
targetPort: app-http-port
部署到环境
准备声明文件
可以将准备好的yaml文件分别执行部署
也可以将它们放到同一个文件中执行,如prometheus-app.yaml文件
apiVersion: v1
kind: Namespace
metadata:
name: prometheus-ns
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
nfs:
path: /nfsData/prometheus
server: 192.168.56.203
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-pvc
namespace: prometheus-ns
spec:
resources:
requests:
storage: 10Gi
accessModes:
- ReadWriteOnce
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-deploy
namespace: prometheus-ns
spec:
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
securityContext:
runAsUser: 0
serviceAccountName: prometheus-sa
containers:
- name: prometheus-container
image: docker.io/bitnami/prometheus:2-debian-10
imagePullPolicy: IfNotPresent
args:
- "--config.file=/prometheus/conf/prometheus.yml"
- "--web.console.libraries=/opt/bitnami/prometheus/conf/console_libraries"
- "--web.console.templates=/opt/bitnami/prometheus/conf/consoles"
- "--storage.tsdb.path=/prometheus/data/"
- "--storage.tsdb.retention=24h"
- "--web.enable-admin-api"
- "--web.enable-lifecycle"
resources:
limits:
memory: "128Mi"
cpu: "500m"
ports:
- containerPort: 9090
name: app-http-port
volumeMounts:
- mountPath: "/prometheus/data/"
subPath: sub1
name: data
- mountPath: "/prometheus/conf/"
name: config
volumes:
- name: data
persistentVolumeClaim:
claimName: prometheus-pvc
- name: config
configMap:
name: prometheus-cm
---
apiVersion: v1
kind: Service
metadata:
name: prometheus-svc
namespace: prometheus-ns
spec:
selector:
app: prometheus
type: NodePort
ports:
- name: web
port: 9090
targetPort: app-http-port
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-cm
namespace: prometheus-ns
data:
prometheus.yml: |
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-sa
namespace: prometheus-ns
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-cr
rules:
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
- configmaps
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-crb
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus-cr
subjects:
- kind: ServiceAccount
name: prometheus-sa
namespace: prometheus-ns
执行部署命令
kubectl create -f prometheus-app.yaml
可以看到已成功部署
测试访问
通过nodeip+port的方式访问
可以看到能成功访问