如何在DigitalOcean上使用Prometheus，Grafana和Alertmanager设置Kubernetes监视堆栈

最新推荐文章于 2024-12-09 13:05:25 发布

cukw6666

最新推荐文章于 2024-12-09 13:05:25 发布

阅读量431

点赞数

文章标签： python java 大数据数据库 kubernetes

原文链接：https://www.digitalocean.com/community/tutorials/how-to-set-up-a-kubernetes-monitoring-stack-with-prometheus-grafana-and-alertmanager-on-digitalocean

版权

Grab the Cluster Monitoring Quickstart Kubernetes Manifests! 获取集群监控快速入门Kubernetes清单！

Kubernetes Cluster Monitoring Quickstart on GitHub GitHub上的Kubernetes集群监控快速入门

介绍 (Introduction)

Along with tracing and logging, monitoring and alerting are essential components of a Kubernetes observability stack. Setting up monitoring for your DigitalOcean Kubernetes cluster allows you to track your resource usage and analyze and debug application errors.

除了跟踪和记录外，监视和警报是Kubernetes可观察性堆栈的基本组成部分。为DigitalOcean Kubernetes群集设置监视可以使您跟踪资源使用情况并分析和调试应用程序错误。

A monitoring system usually consists of a time-series database that houses metric data and a visualization layer. In addition, an alerting layer creates and manages alerts, handing them off to integrations and external services as necessary. Finally, one or more components generate or expose the metric data that will be stored, visualized, and processed for alerts by the stack.

监视系统通常由存储度量数据的时间序列数据库和可视化层组成。此外，警报层可以创建和管理警报，并根据需要将警报传递给集成和外部服务。最后，一个或多个组件生成或公开度量标准数据，这些数据将被堆栈存储，可视化和处理以发出警报。

One popular monitoring solution is the open-source Prometheus, Grafana, and Alertmanager stack, deployed alongside kube-state-metrics and node_exporter to expose cluster-level Kubernetes object metrics as well as machine-level metrics like CPU and memory usage.

一种流行的监视解决方案是开源Prometheus ， Grafana和Alertmanager堆栈，它们与kube-state-metrics和node_exporter一起部署，以公开集群级Kubernetes对象指标以及机器级指标，例如CPU和内存使用率。

Rolling out this monitoring stack on a Kubernetes cluster requires configuring individual components, manifests, Prometheus metrics, and Grafana dashboards, which can take some time. The DigitalOcean Kubernetes Cluster Monitoring Quickstart, released by the DigitalOcean Community Developer Education team, contains fully defined manifests for a Prometheus-Grafana-Alertmanager cluster monitoring stack, as well as a set of preconfigured alerts and Grafana dashboards. It can help you get up and running quickly, and forms a solid foundation from which to build your observability stack.

在Kubernetes集群上部署此监视堆栈需要配置各个组件，清单，Prometheus度量标准和Grafana仪表板，这可能需要一些时间。由DigitalOcean社区开发人员教育团队发布的DigitalOcean Kubernetes群集监视快速入门包含Prometheus-Grafana-Alertmanager群集监视堆栈的完全定义的清单，以及一组预配置的警报和Grafana仪表板。它可以帮助您快速启动并运行，并为构建可观察性堆栈奠定了坚实的基础。

In this tutorial, we’ll deploy this preconfigured stack on DigitalOcean Kubernetes, access the Prometheus, Grafana, and Alertmanager interfaces, and describe how to customize it.

在本教程中，我们将在DigitalOcean Kubernetes上部署此预先配置的堆栈，访问Prometheus，Grafana和Alertmanager接口，并描述如何对其进行自定义。

先决条件 (Prerequisites)

Before you begin, you’ll need a DigitalOcean Kubernetes cluster available to you, and the following tools installed in your local development environment:

在开始之前，您将需要一个DigitalOcean Kubernetes集群，并且需要在本地开发环境中安装以下工具：

The kubectl command-line interface installed on your local machine and configured to connect to your cluster. You can read more about installing and configuring kubectl in its official documentation.
kubectl命令行界面安装在本地计算机上，并配置为连接到集群。您可以在其官方文档中阅读有关安装和配置kubectl更多信息。
The git version control system installed on your local machine. To learn how to install git on Ubuntu 18.04, consult How To Install Git on Ubuntu 18.04.
本地计算机上安装的git版本控制系统。要了解如何在Ubuntu 18.04上安装git，请参阅如何在Ubuntu 18.04上安装Git 。
The Coreutils base64 tool installed on your local machine. If you’re using a Linux machine, this will most likely already be installed. If you’re using OS X, you can use openssl base64, which comes installed by default.
您本地计算机上安装的Coreutils base64工具。如果您使用的是Linux计算机，则很可能已经安装了该计算机。如果使用的是OS X，则可以使用默认安装的openssl base64 。

Note: The Cluster Monitoring Quickstart has only been tested on DigitalOcean Kubernetes clusters. To use the Quickstart with other Kubernetes clusters, some modification to the manifest files may be necessary.

注意：群集监控快速入门仅在DigitalOcean Kubernetes群集上进行了测试。要将Quickstart与其他Kubernetes群集一起使用，可能需要对清单文件进行一些修改。

第1步–克隆GitHub存储库并配置环境变量 (Step 1 — Cloning the GitHub Repository and Configuring Environment Variables)

To start, clone the DigitalOcean Kubernetes Cluster Monitoring GitHub repository onto your local machine using git:

首先，使用git将DigitalOcean Kubernetes Cluster Monitoring GitHub存储库克隆到本地计算机上：

git clone git@github.com:do-community/doks-monitoring.git
git clone git@github.com：do-community / doks-monitoring.git

Then, navigate into the repo:

然后，导航到存储库：

cd doks-monitoring
cd doks-monitoring

You should see the following directory structure:

您应该看到以下目录结构：

ls
ls


   
   
    
    Output
   
   LICENSE
README.md
changes.txt
manifest

The manifest directory contains Kubernetes manifests for all of the monitoring stack components, including Service Accounts, Deployments, StatefulSets, ConfigMaps, etc. To learn more about these manifest files and how to configure them, skip ahead to Configuring the Monitoring Stack.

manifest目录包含所有监视堆栈组件的Kubernetes清单，包括服务帐户，部署， StatefulSet ， ConfigMap等。要了解有关这些清单文件以及如何配置它们的更多信息，请跳至“ 配置监视堆栈” 。

If you just want to get things up and running, begin by setting the APP_INSTANCE_NAME and NAMESPACE environment variables, which will be used to configure a unique name for the stack’s components and configure the Namespace into which the stack will be deployed:

如果只想启动并运行，请先设置APP_INSTANCE_NAME和NAMESPACE环境变量，这将用于为堆栈的组件配置唯一的名称并配置将在其中部署堆栈的命名空间：

export APP_INSTANCE_NAME=sammy-cluster-monitoring
导出APP_INSTANCE_NAME = sammy-cluster-monitoring
export NAMESPACE=default
导出NAMESPACE = 默认

In this tutorial, we set APP_INSTANCE_NAME to sammy-cluster-monitoring, which will prepend all of the monitoring stack Kubernetes object names. You should substitute in a unique descriptive prefix for your monitoring stack. We also set the Namespace to default. If you’d like to deploy the monitoring stack to a Namespace other than default, ensure that you first create it in your cluster:

在本教程中，我们将APP_INSTANCE_NAME设置为sammy-cluster-monitoring ，它将在所有监视堆栈Kubernetes对象名之前。您应该用唯一的描述性前缀代替监视堆栈。我们还将命名空间设置为default 。如果您想部署监控堆栈以外的命名空间default ，请确保您先在集群创建它：

kubectl create namespace "$NAMESPACE"
kubectl创建名称空间“ $ NAMESPACE”

You should see the following output:

您应该看到以下输出：


   
   
    
    Output
   
   namespace/sammy created

In this case, the NAMESPACE environment variable was set to sammy. Throughout the rest of the tutorial we’ll assume that NAMESPACE has been set to default.

在这种情况下， NAMESPACE环境变量设置为sammy 。在本教程的其余部分中，我们将假定NAMESPACE已设置为default 。

Now, use the base64 command to base64-encode a secure Grafana password. Be sure to substitute a password of your choosing for your_grafana_password:

现在，使用base64命令对安全的Grafana密码进行base64编码。确保将您选择的密码替换为your_grafana_password ：

export GRAFANA_GENERATED_PASSWORD="$(echo -n 'your_grafana_password' | base64)"
export GRAFANA_GENERATED_PASSWORD =“ $(echo -n'your_grafana_password '| base64)”

If you’re using macOS, you can substitute the openssl base64 command which comes installed by default.

如果您使用的是macOS，则可以替换默认情况下安装的openssl base64命令。

At this point, you’ve grabbed the stack’s Kubernetes manifests and configured the required environment variables, so you’re now ready to substitute the configured variables into the Kubernetes manifest files and create the stack in your Kubernetes cluster.

至此，您已经获取了堆栈的Kubernetes清单并配置了所需的环境变量，因此现在可以将配置的变量替换为Kubernetes清单文件，并在Kubernetes集群中创建堆栈。

第2步-创建监视堆栈 (Step 2 — Creating the Monitoring Stack)

The DigitalOcean Kubernetes Monitoring Quickstart repo contains manifests for the following monitoring, scraping, and visualization components:

DigitalOcean Kubernetes监视快速入门仓库包含以下监视，抓取和可视化组件的清单：

Prometheus is a time series database and monitoring tool that works by polling metrics endpoints and scraping and processing the data exposed by these endpoints. It allows you to query this data using PromQL, a time series data query language. Prometheus will be deployed into the cluster as a StatefulSet with 2 replicas that uses Persistent Volumes with DigitalOcean Block Storage. In addition, a preconfigured set of Prometheus Alerts, Rules, and Jobs will be stored as a ConfigMap. To learn more about these, skip ahead to the Prometheus section of Configuring the Monitoring Stack.
Prometheus是一个时间序列数据库和监视工具，通过轮询指标端点并抓取和处理这些端点公开的数据来工作。它允许您使用PromQL (时间序列数据查询语言)查询此数据。 Prometheus将作为带有两个副本的StatefulSet部署到群集中，该副本使用带有DigitalOcean Block Storage的永久卷。此外，一组预配置的Prometheus警报，规则和作业将存储为ConfigMap 。要了解有关这些内容的更多信息，请跳至配置监视堆栈的Prometheus部分。
Alertmanager, usually deployed alongside Prometheus, forms the alerting layer of the stack, handling alerts generated by Prometheus and deduplicating, grouping, and routing them to integrations like email or PagerDuty. Alertmanager will be installed as a StatefulSet with 2 replicas. To learn more about Alertmanager, consult Alerting from the Prometheus docs.
Alertmanager通常与Prometheus一起部署，形成堆栈的警报层，处理Prometheus生成的警报，然后对它们进行重复数据删除，分组和路由，然后路由到诸如email或PagerDuty之类的集成中。 Alertmanager将安装为带有2个副本的StatefulSet。要了解有关Alertmanager的更多信息，请查阅Prometheus文档中的Alerting 。
Grafana is a data visualization and analytics tool that allows you to build dashboards and graphs for your metrics data. Grafana will be installed as a StatefulSet with one replica. In addition, a preconfigured set of Dashboards generated by kubernetes-mixin will be stored as a ConfigMap.
Grafana是一种数据可视化和分析工具，可让您为指标数据构建仪表板和图形。 Grafana将作为StatefulSet与一个副本一起安装。此外，由kubernetes-mixin生成的一组预先配置的仪表板将存储为ConfigMap。
kube-state-metrics is an add-on agent that listens to the Kubernetes API server and generates metrics about the state of Kubernetes objects like Deployments and Pods. These metrics are served as plaintext on HTTP endpoints and consumed by Prometheus. kube-state-metrics will be installed as an auto-scalable Deployment with one replica.
kube-state-metrics是一个附加代理，可侦听Kubernetes API服务器并生成有关Kubernetes对象(如Deployments和Pod)的状态的度量。这些度量标准在HTTP端点上用作纯文本，并由Prometheus使用。 kube-state-metrics将作为具有一个副本的可自动伸缩的Deployment安装。
node-exporter, a Prometheus exporter that runs on cluster nodes and provides OS and hardware metrics like CPU and memory usage to Prometheus. These metrics are also served as plaintext on HTTP endpoints and consumed by Prometheus. node-exporter will be installed as a DaemonSet.
node-exporter ，Prometheus导出器，在群集节点上运行，并向Prometheus提供操作系统和硬件指标，例如CPU和内存使用情况。这些度量标准还用作HTTP端点上的纯文本，并由Prometheus使用。 node-exporter将作为DaemonSet安装。

By default, along with scraping metrics generated by node-exporter, kube-state-metrics, and the other components listed above, Prometheus will be configured to scrape metrics from the following components:

默认情况下，连同由node-exporter生成的抓取指标，kube-state-metrics和上面列出的其他组件一起，Prometheus将配置为从以下组件中抓取指标：

kube-apiserver, the Kubernetes API server.
kube-apiserver， Kubernetes API服务器。
kubelet, the primary node agent that interacts with kube-apiserver to manage Pods and containers on a node.
kubelet ，与kube-apiserver交互以管理节点上的Pod和容器的主要节点代理。
cAdvisor, a node agent that discovers running containers and collects their CPU, memory, filesystem, and network usage metrics.
cAdvisor ，一种节点代理，可发现正在运行的容器并收集其CPU，内存，文件系统和网络使用情况的指标。

To learn more about configuring these components and Prometheus scraping jobs, skip ahead to Configuring the Monitoring Stack. We’ll now substitute the environment variables defined in the previous step into the repo’s manifest files, and concatenate the individual manifests into a single master file.

要了解有关配置这些组件和Prometheus抓取作业的更多信息，请跳至“ 配置监视堆栈” 。现在，我们将在上一步中定义的环境变量替换到存储库的清单文件中，并将各个清单连接到单个主文件中。

Begin by using awk and envsubst to fill in the APP_INSTANCE_NAME, NAMESPACE, and GRAFANA_GENERATED_PASSWORD variables in the repo’s manifest files. After substituting in the variable values, the files will be combined and saved into a master manifest file called sammy-cluster-monitoring_manifest.yaml.

首先使用awk和envsubst在回购清单文件中填写APP_INSTANCE_NAME ， NAMESPACE和GRAFANA_GENERATED_PASSWORD变量。替换变量值之后，这些文件将合并并保存到一个名为sammy-cluster-monitoring _manifest.yaml的主清单文件中。

awk 'FNR==1 {print "---"}{print}' manifest/* \
awk'FNR == 1 {print“ ---”} {print}'清单/ * \
| envsubst '$APP_INSTANCE_NAME $NAMESPACE $GRAFANA_GENERATED_PASSWORD' \
| envsubst'$ APP_INSTANCE_NAME $ NAMESPACE $ GRAFANA_GENERATED_PASSWORD'\
> "${APP_INSTANCE_NAME}_manifest.yaml"
>“ $ {APP_INSTANCE_NAME} _manifest.yaml”

You should consider storing this file in version control so that you can track changes to the monitoring stack and roll back to previous versions. If you do this, be sure to scrub the admin-password variable from the file so that you don’t check your Grafana password into version control.

您应该考虑将此文件存储在版本控制中，以便可以跟踪对监视堆栈的更改并回滚到以前的版本。如果这样做，请确保从文件中清除admin-password变量，以免将Grafana密码检入版本控制中。

Now that you’ve generated the master manifest file, use kubectl apply -f to apply the manifest and create the stack in the Namespace you configured:

既然已经生成了主清单文件，请使用kubectl apply -f来应用清单并在您配置的命名空间中创建堆栈：

kubectl apply -f "${APP_INSTANCE_NAME}_manifest.yaml" --namespace "${NAMESPACE}"
kubectl apply -f“ $ {APP_INSTANCE_NAME} _manifest.yaml” --namespace“ $ {NAMESPACE}”

You should see output similar to the following:

您应该看到类似于以下内容的输出：


   
   
    
    Output
   
   serviceaccount/alertmanager created
configmap/sammy-cluster-monitoring-alertmanager-config created
service/sammy-cluster-monitoring-alertmanager-operated created
service/sammy-cluster-monitoring-alertmanager created

. . .

clusterrolebinding.rbac.authorization.k8s.io/prometheus created
configmap/sammy-cluster-monitoring-prometheus-config created
service/sammy-cluster-monitoring-prometheus created
statefulset.apps/sammy-cluster-monitoring-prometheus created

You can track the stack’s deployment progress using kubectl get all. Once all of the stack components are RUNNING, you can access the preconfigured Grafana dashboards through the Grafana web interface.

您可以使用kubectl get all跟踪堆栈的部署进度。一旦所有堆栈组件都处于RUNNING ，您就可以通过Grafana Web界面访问预配置的Grafana仪表板。

第3步-访问Grafana和探索指标数据 (Step 3 — Accessing Grafana and Exploring Metrics Data)

The Grafana Service manifest exposes Grafana as a ClusterIP Service, which means that it’s only accessible via a cluster-internal IP address. To access Grafana outside of your Kubernetes cluster, you can either use kubectl patch to update the Service in-place to a public-facing type like NodePort or LoadBalancer, or kubectl port-forward to forward a local port to a Grafana Pod port. In this tutorial we’ll forward ports, so you can skip ahead to Forwarding a Local Port to Access the Grafana Service. The following section on exposing Grafana externally is included for reference purposes.

Grafana服务清单将Grafana公开为ClusterIP服务，这意味着只能通过群集内部的IP地址访问它。要在Kubernetes集群之外访问Grafana，可以使用kubectl patch将服务就地更新为面向公众的类型(例如NodePort或LoadBalancer ，也可以使用kubectl port-forward将本地端口kubectl port-forward到Grafana Pod端口。在本教程中，我们将转发端口，因此您可以跳至转发本地端口以访问Grafana服务。包括以下有关在外部公开Grafana的部分，以供参考。

使用负载均衡器公开Grafana服务(可选) (Exposing the Grafana Service using a Load Balancer (optional))

If you’d like to create a DigitalOcean Load Balancer for Grafana with an external public IP, use kubectl patch to update the existing Grafana Service in-place to the LoadBalancer Service type:

如果您想使用外部公共IP为Grafana创建DigitalOcean负载平衡器，请使用kubectl patch将现有的Grafana服务就地更新为LoadBalancer服务类型：

kubectl patch svc "$APP_INSTANCE_NAME-grafana" \
kubectl补丁程序svc“ $ APP_INSTANCE_NAME-grafana” \
--namespace "$NAMESPACE" \
--namespace“ $ NAMESPACE” \
-p '{"spec": {"type": "LoadBalancer"}}'
-p'{“ spec”：{“ type”：“ LoadBalancer”}}'

The kubectl patch command allows you to update Kubernetes objects in-place to make changes without having to re-deploy the objects. You can also modify the master manifest file directly, adding a type: LoadBalancer parameter to the Grafana Service spec. To learn more about kubectl patch and Kubernetes Service types, you can consult the Update API Objects in Place Using kubectl patch and Services resources in the official Kubernetes docs.

kubectl patch命令可让您就地更新Kubernetes对象以进行更改，而不必重新部署这些对象。您还可以直接修改主清单文件，在Grafana Service spec中添加type: LoadBalancer参数。要了解有关kubectl patch和Kubernetes服务类型的更多信息，可以在Kubernetes官方文档中查阅使用kubectl补丁程序和服务资源就地更新API对象。

After running the above command, you should see the following:

运行上面的命令后，您应该看到以下内容：


   
   
    
    Output
   
   service/sammy-cluster-monitoring-grafana patched

It may take several minutes to create the Load Balancer and assign it a public IP. You can track its progress using the following command with the -w flag to watch for changes:

创建负载均衡器并为其分配公共IP可能需要几分钟。您可以使用带有-w标志的以下命令来跟踪其进度，以监视更改：

kubectl get service "$APP_INSTANCE_NAME-grafana" -w
kubectl获取服务“ $ APP_INSTANCE_NAME-grafana” -w

Once the DigitalOcean Load Balancer has been created and assigned an external IP address, you can fetch its external IP using the following commands:

创建DigitalOcean负载均衡器并分配了外部IP地址后，您可以使用以下命令获取其外部IP：

SERVICE_IP=$(kubectl get svc $APP_INSTANCE_NAME-grafana \
SERVICE_IP = $(kubectl获取服务$ APP_INSTANCE_NAME-grafana \
--namespace $NAMESPACE \
--namespace $ NAMESPACE \
--output jsonpath='{.status.loadBalancer.ingress[0].ip}')
--output jsonpath ='{。status.loadBalancer.ingress [0] .ip}')
echo "http://${SERVICE_IP}/"
回显“ http：// $ { SERVICE_IP } /”

You can now access the Grafana UI by navigating to http://SERVICE_IP/.

现在，您可以通过导航到http:// SERVICE_IP /来访问Grafana UI。

转发本地端口以访问Grafana服务 (Forwarding a Local Port to Access the Grafana Service)

If you don’t want to expose the Grafana Service externally, you can also forward local port 3000 into the cluster directly to a Grafana Pod using kubectl port-forward.

如果您不想在外部公开Grafana服务，也可以使用kubectl port-forward将本地端口3000转发到集群中，直接转发到Grafana Pod。

kubectl port-forward --namespace ${NAMESPACE} ${APP_INSTANCE_NAME}-grafana-0 3000
kubectl端口转发--namespace $ {NAMESPACE} $ {APP_INSTANCE_NAME} -grafana-0 3000

You should see the following output:

您应该看到以下输出：


   
   
    
    Output
   
   Forwarding from 127.0.0.1:3000 -> 3000
Forwarding from [::1]:3000 -> 3000

This will forward local port 3000 to containerPort 3000 of the Grafana Pod sammy-cluster-monitoring-grafana-0. To learn more about forwarding ports into a Kubernetes cluster, consult Use Port Forwarding to Access Applications in a Cluster.

这会将本地端口3000转发到Grafana Pod sammy-cluster-monitoring -grafana-0 containerPort 3000 。要了解有关将端口转发到Kubernetes集群的更多信息，请参阅使用端口转发访问集群中的应用程序。

Visit http://localhost:3000 in your web browser. You should see the following Grafana login page:

在网络浏览器中访问http://localhost:3000 。您应该看到以下Grafana登录页面：

To log in, use the default username admin (if you haven’t modified the admin-user parameter), and the password you configured in Step 1.

要登录，请使用默认用户名admin (如果尚未修改admin-user参数)和在步骤1中配置的密码。

You’ll be brought to the following Home Dashboard:

您将被带到以下主页仪表板 ：

In the left-hand navigation bar, select the Dashboards button, then click on Manage:

在左侧导航栏中，选择仪表板按钮，然后单击管理：

You’ll be brought to the following dashboard management interface, which lists the dashboards configured in the dashboards-configmap.yaml manifest:

您将进入以下仪表板管理界面，该界面列出了dashboards-configmap.yaml清单中配置的dashboards-configmap.yaml ：

These dashboards are generated by kubernetes-mixin, an open-source project that allows you to create a standardized set of cluster monitoring Grafana dashboards and Prometheus alerts. To learn more, consult the kubernetes-mixin GitHub repo.

这些仪表板由kubernetes-mixin生成， kubernetes-mixin是一个开源项目，可让您创建一组标准化的集群监视Grafana仪表板和Prometheus警报。要了解更多信息，请参阅kubernetes-mixin GitHub存储库。

Click in to the Kubernetes / Nodes dashboard, which visualizes CPU, memory, disk, and network usage for a given node:

单击进入Kubernetes / Nodes仪表板，该仪表板显示给定节点的CPU，内存，磁盘和网络使用情况：

Describing how to use these dashboards is outside of this tutorial’s scope, but you can consult the following resources to learn more:

描述如何使用这些仪表板不在本教程的讨论范围内，但是您可以参考以下资源以了解更多信息：

To learn more about the USE method for analyzing a system’s performance, you can consult Brendan Gregg’s The Utilization Saturation and Errors (USE) Method page.
要了解有关用于分析系统性能的USE方法的更多信息，请查阅Brendan Gregg的“利用率饱和和错误(USE)方法”页面。
Google’s SRE Book is another helpful resource, in particular Chapter 6: Monitoring Distributed Systems.
Google的SRE图书是另一个有用的资源，尤其是第6章：监视分布式系统。
To learn how to build your own Grafana dashboards, check out Grafana’s Getting Started page.
要了解如何构建自己的Grafana仪表板，请查看Grafana的“ 入门”页面。

In the next step, we’ll follow a similar process to connect to and explore the Prometheus monitoring system.

在下一步中，我们将遵循类似的过程来连接和探索Prometheus监视系统。

步骤4 —访问Prometheus和Alertmanager (Step 4 — Accessing Prometheus and Alertmanager)

To connect to the Prometheus Pods, we can use kubectl port-forward to forward a local port. If you’re done exploring Grafana, you can close the port-forward tunnel by hitting CTRL-C. Alternatively, you can open a new shell and create a new port-forward connection.

要连接到Prometheus Pods，我们可以使用kubectl port-forward转发本地端口。如果您完成了Grafana的探索，则可以通过按CTRL-C来关闭端口转发隧道。或者，您可以打开一个新的Shell并创建一个新的端口转发连接。

Begin by listing running Pods in the default namespace:

首先在default名称空间中列出正在运行的Pod：

kubectl get pod -n default
kubectl get pod -n 默认

You should see the following Pods:

您应该看到以下Pod：


   
   
    
    Output
   
   sammy-cluster-monitoring-alertmanager-0                      1/1     Running   0          17m
sammy-cluster-monitoring-alertmanager-1                      1/1     Running   0          15m
sammy-cluster-monitoring-grafana-0                           1/1     Running   0          16m
sammy-cluster-monitoring-kube-state-metrics-d68bb884-gmgxt   2/2     Running   0          16m
sammy-cluster-monitoring-node-exporter-7hvb7                 1/1     Running   0          16m
sammy-cluster-monitoring-node-exporter-c2rvj                 1/1     Running   0          16m
sammy-cluster-monitoring-node-exporter-w8j74                 1/1     Running   0          16m
sammy-cluster-monitoring-prometheus-0                        1/1     Running   0          16m
sammy-cluster-monitoring-prometheus-1                        1/1     Running   0          16m

We are going to forward local port 9090 to port 9090 of the sammy-cluster-monitoring-prometheus-0 Pod:

我们将把本地端口9090转发到sammy-cluster-monitoring -prometheus-0 Pod的端口9090 ：

kubectl port-forward --namespace ${NAMESPACE} sammy-cluster-monitoring-prometheus-0 9090
kubectl端口转发--namespace $ {NAMESPACE} sammy-cluster-monitoring -prometheus-0 9090

You should see the following output:

您应该看到以下输出：


   
   
    
    Output
   
   Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090

This indicates that local port 9090 is being forwarded successfully to the Prometheus Pod.

这表明本地端口9090已成功转发到Prometheus Pod。

Visit http://localhost:9090 in your web browser. You should see the following Prometheus Graph page:

在网络浏览器中访问http://localhost:9090 。您应该看到以下Prometheus Graph页面：

From here you can use PromQL, the Prometheus query language, to select and aggregate time series metrics stored in its database. To learn more about PromQL, consult Querying Prometheus from the official Prometheus docs.

在这里，您可以使用Prometheus查询语言PromQL选择并汇总存储在其数据库中的时间序列指标。要了解有关PromQL的更多信息，请查阅Prometheus官方文档中的Querying Prometheus。

In the Expression field, type kubelet_node_name and hit Execute. You should see a list of time series with the metric kubelet_node_name that reports the Nodes in your Kubernetes cluster. You can see which node generated the metric and which job scraped the metric in the metric labels:

在表达式字段中，输入kubelet_node_name然后单击执行。您应该看到一个带有度量kubelet_node_name的时间序列列表，该kubelet_node_name报告您的Kubernetes集群中的节点。您可以在度量标准标签中查看哪个节点生成了度量标准，以及哪个作业将度量标准刮取了：

Finally, in the top navigation bar, click on Status and then Targets to see the list of targets Prometheus has been configured to scrape. You should see a list of targets corresponding to the list of monitoring endpoints described at the beginning of Step 2.

最后，在顶部导航栏中，单击“ 状态” ，然后单击“ 目标”以查看Prometheus已配置为抓取的目标列表。您应该看到与步骤2开头所述的监视端点列表相对应的目标列表。

To learn more about Prometheus and how to query your cluster metrics, consult the official Prometheus docs.

要了解有关Prometheus以及如何查询集群指标的更多信息，请查阅Prometheus官方文档。

To connect to Alertmanager, which manages Alerts generated by Prometheus, we’ll follow a similar process to what we used to connect to Prometheus. . In general, you can explore Alertmanager Alerts by clicking into Alerts in the Prometheus top navigation bar.

要连接到用来管理Prometheus生成的警报的Alertmanager，我们将遵循与用于连接Prometheus的过程类似的过程。。通常，您可以通过单击Prometheus顶部导航栏中的“ 警报”来探索Alertmanager警报。

To connect to the Alertmanager Pods, we will once again use kubectl port-forward to forward a local port. If you’re done exploring Prometheus, you can close the port-forward tunnel by hitting CTRL-Cor open a new shell to create a new connection. .

要连接到Alertmanager Pod，我们将再次使用kubectl port-forward转发本地端口。如果您已经完成了Prometheus的探索，则可以通过按CTRL-C来关闭端口转发隧道，或者打开新的Shell来创建新的连接。。

We are going to forward local port 9093 to port 9093 of the sammy-cluster-monitoring-alertmanager-0 Pod:

我们将把本地端口9093到sammy-cluster-monitoring -alertmanager-0 Pod的端口9093 ：

kubectl port-forward --namespace ${NAMESPACE} sammy-cluster-monitoring-alertmanager-0 9093
kubectl端口转发--namespace $ {NAMESPACE} sammy-cluster-monitoring -alertmanager-0 9093

You should see the following output:

您应该看到以下输出：


   
   
    
    Output
   
   Forwarding from 127.0.0.1:9093 -> 9093
Forwarding from [::1]:9093 -> 9093

This indicates that local port 9093 is being forwarded successfully to an Alertmanager Pod.

这表明本地端口9093已成功转发到Alertmanager Pod。

Visit http://localhost:9093 in your web browser. You should see the following Alertmanager Alerts page:

在网络浏览器中访问http://localhost:9093 。您应该看到以下Alertmanager 警报页面：

From here, you can explore firing alerts and optionally silencing them. To learn more about Alertmanager, consult the official Alertmanager documentation.

在这里，您可以浏览触发警报并选择使它们静音。要了解有关Alertmanager的更多信息，请查阅Alertmanager官方文档。

In the next step, you’ll learn how to optionally configure and scale some of the monitoring stack components.

在下一步中，您将学习如何有选择地配置和扩展某些监视堆栈组件。

步骤6 —配置监视堆栈(可选) (Step 6 — Configuring the Monitoring Stack (optional))

The manifests included in the DigitalOcean Kubernetes Cluster Monitoring Quickstart repository can be modified to use different container images, different numbers of Pod replicas, different ports, and customized configuration files.

可以修改DigitalOcean Kubernetes Cluster Monitoring Quickstart存储库中包含的清单，以使用不同的容器映像，不同数量的Pod副本，不同的端口和自定义的配置文件。

In this step, we’ll provide a high-level overview of each manifest’s purpose, and then demonstrate how to scale Prometheus up to 3 replicas by modifying the master manifest file.

在此步骤中，我们将提供每个清单目的的高级概述，然后演示如何通过修改主清单文件将Prometheus最多扩展到3个副本。

To begin, navigate into the manifests subdirectory in the repo, and list the directory’s contents:

首先，导航到仓库中的manifests子目录，并列出目录的内容：

cd manifest
光盘清单
ls
ls


   
   
    
    Output
   
   alertmanager-0serviceaccount.yaml
alertmanager-configmap.yaml
alertmanager-operated-service.yaml
alertmanager-service.yaml
. . .
node-exporter-ds.yaml
prometheus-0serviceaccount.yaml
prometheus-configmap.yaml
prometheus-service.yaml
prometheus-statefulset.yaml

Here you’ll find manifests for the different monitoring stack components. To learn more about specific parameters in the manifests, click into the links and consult the comments included throughout the YAML files:

在这里，您将找到不同监视堆栈组件的清单。要了解有关清单中特定参数的更多信息，请单击链接并查阅YAML文件中包含的注释：

警报经理 (Alertmanager)

alertmanager-0serviceaccount.yaml: The Alertmanager Service Account, used to give the Alertmanager Pods a Kubernetes identity. To learn more about Service Accounts, consult Configure Service Accounts for Pods.
alertmanager-0serviceaccount.yaml ：Alertmanager服务帐户，用于为Alertmanager Pods赋予Kubernetes身份。要了解有关服务帐户的更多信息，请参阅为Pod配置服务帐户。
alertmanager-configmap.yaml: A ConfigMap containing a minimal Alertmanager configuration file, called alertmanager.yml. Configuring Alertmanager is beyond the scope of this tutorial, but you can learn more by consulting the Configuration section of the Alertmanager documentation.
alertmanager-configmap.yaml ：一个ConfigMap，其中包含一个最小的Alertmanager配置文件，名为alertmanager.yml 。配置Alertmanager超出了本教程的范围，但是您可以通过参考Alertmanager文档的Configuration部分来了解更多信息。
alertmanager-operated-service.yaml: The Alertmanager mesh Service, which is used for routing requests between Alertmanager Pods in the current 2-replica high-availability configuration.
alertmanager-operated-service.yaml ：Alertmanager mesh服务，用于在当前2副本高可用性配置中的Alertmanager Pod之间路由请求。
alertmanager-service.yaml: The Alertmanager web Service, which is used to access the Alertmanager web interface, which you may have done in the previous step.
alertmanager-service.yaml ：Alertmanager web服务，用于访问Alertmanager Web界面，您可能已在上一步中完成了此操作。
alertmanager-statefulset.yaml: The Alertmanager StatefulSet, configured with 2 replicas.
alertmanager-statefulset.yaml ：Alertmanager StatefulSet，配置有2个副本。

格拉法纳 (Grafana)

dashboards-configmap.yaml: A ConfigMap containing the preconfigured JSON Grafana monitoring dashboards. Generating a new set of dashboards and alerts from scratch goes beyond the scope of this tutorial, but to learn more you can consult the kubernetes-mixin GitHub repo.
dashboards-configmap.yaml ：包含预配置的JSON Grafana监控仪表板的ConfigMap。从头开始生成一组新的仪表板和警报超出了本教程的范围，但是要了解更多信息，可以查阅kubernetes-mixin GitHub存储库。
grafana-0serviceaccount.yaml: The Grafana Service Account.
grafana-0serviceaccount.yaml ：Grafana服务帐户。
grafana-configmap.yaml: A ConfigMap containing a default set of minimal Grafana configuration files.
grafana-configmap.yaml ：包含默认的最小Grafana配置文件集的ConfigMap。
grafana-secret.yaml: A Kubernetes Secret containing the Grafana admin user and password. To learn more about Kubernetes Secrets, consult Secrets.
grafana-secret.yaml ：一个包含Grafana管理员用户和密码的Kubernetes机密。要了解有关Kubernetes Secrets的更多信息，请参阅Secrets 。
grafana-service.yaml: The manifest defining the Grafana Service.
grafana-service.yaml ：定义Grafana服务的清单。
grafana-statefulset.yaml: The Grafana StatefulSet, configured with 1 replica, which is not scalable. Scaling Grafana is beyond the scope of this tutorial. To learn how to create a highly available Grafana set up, you can consult How to setup Grafana for High Availability from the official Grafana docs.
grafana-statefulset.yaml ：Grafana StatefulSet，配置有1个副本，该副本不可扩展。扩展Grafana超出了本教程的范围。要了解如何创建高可用性的Grafana设置，可以从官方Grafana文档中咨询如何为高可用性设置Grafana 。

库伯状态量度 (kube-state-metrics)

kube-state-metrics-0serviceaccount.yaml: The kube-state-metrics Service Account and ClusterRole. To learn more about ClusterRoles, consult Role and ClusterRole from the Kubernetes docs.
kube-state-metrics-0serviceaccount.yaml ：kube-state-metrics服务帐户和ClusterRole。要了解有关ClusterRoles的更多信息，请参阅Kubernetes文档中的Role和ClusterRole 。
kube-state-metrics-deployment.yaml: The main kube-state-metrics Deployment manifest, configured with 1 dynamically scalable replica using addon-resizer.
kube-state-metrics-deployment.yaml ：主要的kube-state-metrics部署清单，使用addon-resizer resizer配置了1个可动态扩展的副本。
kube-state-metrics-service.yaml: The Service exposing the kube-state-metrics Deployment.
kube-state-metrics-service.yaml ：公开kube-state-metrics部署的服务。

节点导出器 (node-exporter)

node-exporter-0serviceaccount.yaml: The node-exporter Service Account.
node-exporter-0serviceaccount.yaml ：节点导出程序服务帐户。
node-exporter-ds.yaml: The node-exporter DaemonSet manifest. Since node-exporter is a DaemonSet, a node-exporter Pod runs on each Node in the cluster.
node-exporter-ds.yaml ：节点导出器DaemonSet清单。由于node-exporter是DaemonSet，因此node-exporter Pod在群集中的每个Node上运行。

普罗米修斯 (Prometheus)

prometheus-0serviceaccount.yaml: The Prometheus Service Account, ClusterRole and ClusterRoleBinding.
prometheus-0serviceaccount.yaml ：Prometheus服务帐户，ClusterRole和ClusterRoleBinding。
prometheus-configmap.yaml: A ConfigMap that contains three configuration files:

prometheus-configmap.yaml ：包含三个配置文件的ConfigMap：
- alerts.yaml: Contains a preconfigured set of alerts generated by kubernetes-mixin (which was also used to generate the Grafana dashboards). To learn more about configuring alerting rules, consult Alerting Rules from the Prometheus docs.
  alerts.yaml ：包含由kubernetes-mixin生成的一组预配置警报(也用于生成Grafana仪表板)。要了解有关配置警报规则的更多信息，请参阅Prometheus文档中的警报规则。
- prometheus.yaml: Prometheus’s main configuration file. Prometheus has been preconfigured to scrape all the components listed at the beginning of Step 2. Configuring Prometheus goes beyond the scope of this article, but to learn more, you can consult Configuration from the official Prometheus docs.
  prometheus.yaml ：Prometheus的主要配置文件。 Prometheus已预先配置为刮取在步骤2开头列出的所有组件。配置Prometheus超出了本文的范围，但是要了解更多信息，可以从Prometheus官方文档中查阅Configuration 。
- rules.yaml: A set of Prometheus recording rules that enable Prometheus to compute frequently needed or computationally expensive expressions, and save their results as a new set of time series. These are also generated by kubernetes-mixin, and configuring them goes beyond the scope of this article. To learn more, you can consult Recording Rules from the official Prometheus documentation.
  rules.yaml ：一组Prometheus记录规则，使Prometheus能够计算经常需要或计算rules.yaml表达式，并将其结果保存为一组新的时间序列。这些也是由kubernetes-mixin生成的，配置它们超出了本文的范围。要了解更多信息，可以从Prometheus官方文档中查阅“ 录制规则” 。
prometheus-service.yaml: The Service that exposes the Prometheus StatefulSet.
prometheus-service.yaml ：公开Prometheus StatefulSet的服务。
prometheus-statefulset.yaml: The Prometheus StatefulSet, configured with 2 replicas. This parameter can be scaled depending on your needs.
prometheus-statefulset.yaml ：Prometheus StatefulSet，配置有2个副本。可以根据需要缩放此参数。

示例：缩放Prometheus (Example: Scaling Prometheus)

To demonstrate how to modify the monitoring stack, we’ll scale the number of Prometheus replicas from 2 to 3.

为了演示如何修改监视堆栈，我们将Prometheus副本的数量从2缩放到3。

Open the sammy-cluster-monitoring_manifest.yaml master manifest file using your editor of choice:

使用您选择的编辑器打开sammy-cluster-monitoring _manifest.yaml主清单文件：

nano sammy-cluster-monitoring_manifest.yaml
纳米sammy群集监视 _manifest.yaml

Scroll down to the Prometheus StatefulSet section of the manifest:

向下滚动到清单的Prometheus StatefulSet部分：


   
   
    
    Output
   
   . . .
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
  name: sammy-cluster-monitoring-prometheus
  labels: &Labels
    k8s-app: prometheus
    app.kubernetes.io/name: sammy-cluster-monitoring
    app.kubernetes.io/component: prometheus
spec:
  serviceName: "sammy-cluster-monitoring-prometheus"
  replicas: 2
  podManagementPolicy: "Parallel"
  updateStrategy:
    type: "RollingUpdate"
  selector:
    matchLabels: *Labels
  template:
    metadata:
      labels: *Labels
    spec:
. . .

Change the number of replicas from 2 to 3:

将副本数从2更改为3：


   
   
    
    Output
   
   . . .
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
  name: sammy-cluster-monitoring-prometheus
  labels: &Labels
    k8s-app: prometheus
    app.kubernetes.io/name: sammy-cluster-monitoring
    app.kubernetes.io/component: prometheus
spec:
  serviceName: "sammy-cluster-monitoring-prometheus"
  replicas: 3
  podManagementPolicy: "Parallel"
  updateStrategy:
    type: "RollingUpdate"
  selector:
    matchLabels: *Labels
  template:
    metadata:
      labels: *Labels
    spec:
. . .

When you’re done, save and close the file.

完成后，保存并关闭文件。

Apply the changes using kubectl apply -f:

使用kubectl apply -f应用更改：

kubectl apply -f sammy-cluster-monitoring_manifest.yaml --namespace default
kubectl apply -f sammy-cluster-monitoring _manifest.yaml --namespace默认

You can track progress using kubectl get pods. Using this same technique, you can update many of the Kubernetes parameters and much of the configuration for this observability stack.

您可以使用kubectl get pods跟踪进度。使用相同的技术，您可以为此可观察性堆栈更新许多Kubernetes参数和许多配置。

结论 (Conclusion)

In this tutorial, you installed a Prometheus, Grafana, and Alertmanager monitoring stack into your DigitalOcean Kubernetes cluster with a standard set of dashboards, Prometheus rules, and alerts.

在本教程中，您已将Prometheus，Grafana和Alertmanager监视堆栈安装到具有标准仪表板，Prometheus规则和警报的DigitalOcean Kubernetes集群中。

You may also choose to deploy this monitoring stack using the Helm Kubernetes package manager. To learn more, consult How to Set Up DigitalOcean Kubernetes Cluster Monitoring with Helm and Prometheus. An alternative way to get a similar stack up and running is to use the DigitalOcean Marketplace Kubernetes Monitoring Stack solution, currently in beta.

您也可以选择使用Helm Kubernetes软件包管理器部署此监视堆栈。要了解更多信息，请参阅如何使用Helm和Prometheus设置DigitalOcean Kubernetes集群监视。获得和运行类似堆栈的另一种方法是使用DigitalOcean Marketplace Kubernetes Monitoring Stack解决方案 (目前处于测试版)。

The DigitalOcean Kubernetes Cluster Monitoring Quickstart repository is heavily based on and modified from Google Cloud Platform’s click-to-deploy Prometheus solution. A full manifest of modifications and changes from the original repository can be found in the Quickstart repo’s changes.md file.

DigitalOcean Kubernetes集群监控快速入门资料库很大程度上基于Google Cloud Platform的单击部署Prometheus解决方案，并对其进行了修改。在Quickstart存储库的changes.md文件中可以找到原始存储库中所做修改和更改的完整清单。

翻译自: https://www.digitalocean.com/community/tutorials/how-to-set-up-a-kubernetes-monitoring-stack-with-prometheus-grafana-and-alertmanager-on-digitalocean