如何使用Helm和Prometheus运算符设置DigitalOcean Kubernetes集群监视

最新推荐文章于 2024-08-01 22:41:51 发布

cukw6666

最新推荐文章于 2024-08-01 22:41:51 发布

阅读量243

点赞数

文章标签：可视化 python java 大数据数据库

原文链接：https://www.digitalocean.com/community/tutorials/how-to-set-up-digitalocean-kubernetes-cluster-monitoring-with-helm-and-prometheus-operator

版权

介绍 (Introduction)

Along with tracing and logging, monitoring and alerting are essential components of a Kubernetes observability stack. Setting up monitoring for your Kubernetes cluster allows you to track your resource usage and analyze and debug application errors.

除了跟踪和记录外，监视和警报是Kubernetes可观察性堆栈的基本组成部分。为Kubernetes集群设置监视可以使您跟踪资源使用情况并分析和调试应用程序错误。

A monitoring system usually consists of a time-series database that houses metric data and a visualization layer. In addition, an alerting layer creates and manages alerts, handing them off to integrations and external services as necessary. Finally, one or more components generate or expose the metric data that will be stored, visualized, and processed for alerts by this monitoring stack.

监视系统通常由存储度量数据的时间序列数据库和可视化层组成。此外，警报层可以创建和管理警报，并根据需要将警报传递给集成和外部服务。最后，一个或多个组件将生成或公开将由此监视堆栈存储，可视化和处理警报的度量标准数据。

One popular monitoring solution is the open-source Prometheus, Grafana, and Alertmanager stack:

一种流行的监视解决方案是开源Prometheus ， Grafana和Alertmanager堆栈：

Prometheus is a time series database and monitoring tool that works by polling metrics endpoints and scraping and processing the data exposed by these endpoints. It allows you to query this data using PromQL, a time series data query language.
Prometheus是一个时间序列数据库和监视工具，通过轮询指标端点并抓取和处理这些端点公开的数据来工作。它允许您使用PromQL (时间序列数据查询语言)查询此数据。
Grafana is a data visualization and analytics tool that allows you to build dashboards and graphs for your metrics data.
Grafana是一种数据可视化和分析工具，可让您为指标数据构建仪表板和图形。
Alertmanager, usually deployed alongside Prometheus, forms the alerting layer of the stack, handling alerts generated by Prometheus and deduplicating, grouping, and routing them to integrations like email or PagerDuty.
Alertmanager通常与Prometheus一起部署，形成堆栈的警报层，处理Prometheus生成的警报，然后对它们进行重复数据删除，分组和路由，然后路由到诸如email或PagerDuty之类的集成中。

In addition, tools like kube-state-metrics and node_exporter expose cluster-level Kubernetes object metrics as well as machine-level metrics like CPU and memory usage.

此外，诸如kube-state-metrics和node_exporter之类的工具还提供集群级Kubernetes对象度量标准以及诸如CPU和内存使用率之类的机器级度量标准。

Implementing this monitoring stack on a Kubernetes cluster can be complicated, but luckily some of this complexity can be managed with the Helm package manager and CoreOS’s Prometheus Operator and kube-prometheus projects. These projects bake in standard configurations and dashboards for Prometheus and Grafana, and abstract away some of the lower-level Kubernetes object definitions. The Helm prometheus-operator chart allows you to get a full cluster monitoring solution up and running by installing Prometheus Operator and the rest of the components listed above, along with a default set of dashboards, rules, and alerts useful for monitoring Kubernetes clusters.

在Kubernetes集群上实现此监视堆栈可能很复杂，但是幸运的是，可以通过Helm软件包管理器，CoreOS的Prometheus Operator和kube-prometheus项目来管理其中的某些复杂性。这些项目使用Prometheus和Grafana的标准配置和仪表板进行烘焙，并抽象出一些较低级别的Kubernetes对象定义。 Helm prometheus-operator图表允许您通过安装Prometheus Operator和上面列出的其余组件，以及对监视Kubernetes集群有用的默认一组仪表板，规则和警报，来启动并运行完整的集群监视解决方案。

In this tutorial, we will demonstrate how to install the prometheus-operator Helm chart on a DigitalOcean Kubernetes cluster. By the end of the tutorial, you will have installed a full monitoring stack into your cluster.

在本教程中，我们将演示如何在DigitalOcean Kubernetes集群上安装prometheus-operator Helm图表。在本教程结束时，您将在集群中安装完整的监视堆栈。

先决条件 (Prerequisites)

To follow this tutorial, you will need:

要遵循本教程，您将需要：

A DigitalOcean Kubernetes cluster.
一个DigitalOcean Kubernetes集群。
The kubectl command-line interface installed on your local machine and configured to connect to your cluster. You can read more about installing and configuring kubectl in its official documentation.
kubectl命令行界面安装在本地计算机上，并配置为连接到集群。您可以在其官方文档中阅读有关安装和配置kubectl更多信息。
The Helm package manager (2.10+) installed on your local machine and Tiller installed on your cluster, as detailed in How To Install Software on Kubernetes Clusters with the Helm Package Manager.
在本地计算机上安装了Helm软件包管理器(2.10+)，在群集上安装了Tiller，如如何使用Helm软件包管理器在Kubernetes群集上安装软件中所述。

第1步-创建自定义值文件 (Step 1 — Creating a Custom Values File)

Before we install the prometheus-operator Helm chart, we’ll create a custom values file that will override some of the chart’s defaults with DigitalOcean-specific configuration parameters. To learn more about overriding default chart values, consult the Helm Install section of the Helm docs.

在安装prometheus-operator Helm图表之前，我们将创建一个自定义值文件，该文件将使用DigitalOcean特定的配置参数覆盖该图表的某些默认值。要了解有关覆盖默认图表值的更多信息，请查阅Helm文档的Helm Install部分。

To begin, create and open a file called custom-values.yaml on your local machine using nano or your favorite editor:

首先，使用nano或您喜欢的编辑器在本地计算机上创建并打开一个名为custom-values.yaml的文件：

nano custom-values.yaml
纳米custom-values.yaml

Copy and paste in the following custom values, which enable persistent storage for the Prometheus, Grafana, and Alertmananger components, and disable monitoring for Kubernetes control plane components not exposed on DigitalOcean Kubernetes:

复制并粘贴以下自定义值，以启用Prometheus，Grafana和Alertmananger组件的持久存储，并禁用对未在DigitalOcean Kubernetes上公开的Kubernetes控制平面组件的监视：

custom-values.yaml

# Define persistent storage for Prometheus (PVC)
prometheus:
  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          storageClassName: do-block-storage
          resources:
            requests:
              storage: 5Gi

# Define persistent storage for Grafana (PVC)
grafana:
  # Set password for Grafana admin user
  adminPassword: your_admin_password
  persistence:
    enabled: true
    storageClassName: do-block-storage
    accessModes: ["ReadWriteOnce"]
    size: 5Gi

# Define persistent storage for Alertmanager (PVC)
alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          storageClassName: do-block-storage
          resources:
            requests:
              storage: 5Gi

# Change default node-exporter port
prometheus-node-exporter:
  service:
    port: 30206
    targetPort: 30206

# Disable Etcd metrics
kubeEtcd:
  enabled: false

# Disable Controller metrics
kubeControllerManager:
  enabled: false

# Disable Scheduler metrics
kubeScheduler:
  enabled: false

In this file, we override some of the default values packaged with the chart in its values.yaml file.

在此文件中，我们将覆盖与图表打包在其values.yaml文件中的一些默认值。

We first enable persistent storage for Prometheus, Grafana, and Alertmanager so that their data persists across Pod restarts. Behind the scenes, this defines a 5 Gi Persistent Volume Claim (PVC) for each component, using the DigitalOcean Block Storage storage class. You should modify the size of these PVCs to suit your monitoring storage needs. To learn more about PVCs, consult Persistent Volumes from the official Kubernetes docs.

我们首先为Prometheus，Grafana和Alertmanager启用持久存储，以使它们的数据在Pod重新启动后仍然存在。在幕后，它使用DigitalOcean Block Storage存储类为每个组件定义了5 Gi持久性体积声明(PVC)。您应该修改这些PVC的大小以适合您的监视存储需求。要了解有关PVC的更多信息，请查阅Kubernetes官方文档中的Persistent Volumes 。

Next, replace your_admin_password with a secure password that you’ll use to log in to the Grafana metrics dashboard with the admin user.

接下来，将您的your_admin_password替换为安全密码，您将使用该密码与管理员用户一起登录Grafana指标仪表板。

We’ll then configure a different port for node-exporter. Node-exporter runs on each Kubernetes node and provides OS and hardware metrics to Prometheus. We must change its default port to get around the DigitalOcean Kubernetes firewall defaults, which will block port 9100 but allow ports in the range 30000-32767. Alternatively, you can configure a custom firewall rule for node-exporter. To learn how, consult How to Configure Firewall Rules from the official DigitalOcean Cloud Firewalls docs.

然后，我们将为node-exporter配置另一个端口。节点导出器在每个Kubernetes节点上运行，并为Prometheus提供操作系统和硬件指标。我们必须更改其默认端口以绕过DigitalOcean Kubernetes防火墙默认设置，该默认设置将阻止端口9100，但允许端口范围为30000-32767。或者，您可以为node-exporter配置自定义防火墙规则。要了解操作方法，请参阅DigitalOcean Cloud Firewalls官方文档中的“如何配置防火墙规则” 。

Finally, we’ll disable metrics collection for three Kubernetes control plane components that do not expose metrics on DigitalOcean Kubernetes: the Kubernetes Scheduler and Controller Manager, and etcd cluster data store.

最后，我们将禁用三个不会在DigitalOcean Kubernetes上公开指标的Kubernetes 控制平面组件的指标收集：Kubernetes Scheduler和Controller Manager以及etcd集群数据存储。

To see the full list of configurable parameters for the prometheus-operator chart, consult the Configuration section from the chart repo README or the default values file.

要查看prometheus-operator图表的可配置参数的完整列表，请查阅图表存储库README或默认值文件中的Configuration(配置)部分。

When you’re done editing, save and close the file. We can now install the chart using Helm.

完成编辑后，保存并关闭文件。现在，我们可以使用Helm安装图表。

步骤2 —安装`prometheus-operator`图表 (Step 2 — Installing the `prometheus-operator` Chart)

The prometheus-operator Helm chart will install the following monitoring components into your DigitalOcean Kubernetes cluster:

prometheus-operator Helm图表将在您的DigitalOcean Kubernetes集群中安装以下监视组件：

Prometheus Operator, a Kubernetes Operator that allows you to configure and manage Prometheus clusters. Kubernetes Operators integrate domain-specific logic into the process of packaging, deploying, and managing applications with Kubernetes. To learn more about Kubernetes Operators, consult the CoreOS Operators Overview. To learn more about Prometheus Operator, consult this introductory post on the Prometheus Operator and the Prometheus Operator GitHub repo. Prometheus Operator will be installed as a Deployment.
Prometheus Operator，一个Kubernetes Operator ，允许您配置和管理Prometheus集群。 Kubernetes运营商将特定于域的逻辑集成到使用Kubernetes打包，部署和管理应用程序的过程中。要了解有关Kubernetes运营商的更多信息，请参阅CoreOS运营商概述。要了解有关Prometheus Operator的更多信息，请在Prometheus Operator和Prometheus Operator GitHub存储库上查阅此介绍性帖子。 Prometheus Operator将作为Deployment安装。
Prometheus, installed as a StatefulSet.
Prometheus，作为StatefulSet安装。
Alertmanager, a service that handles alerts sent by the Prometheus server and routes them to integrations like PagerDuty or email. To learn more about Alertmanager, consult Alerting from the Prometheus docs. Alertmanager will be installed as a StatefulSet.
Alertmanager，一项处理Prometheus服务器发送的警报并将其路由到诸如PagerDuty或电子邮件之类的集成的服务。要了解有关Alertmanager的更多信息，请查阅Prometheus文档中的Alerting 。 Alertmanager将作为StatefulSet安装。
Grafana, a time series data visualization tool that allows you to visualize and create dashboards for your Prometheus metrics. Grafana will be installed as a Deployment.
Grafana，时间序列数据可视化工具，可让您可视化并创建Prometheus指标的仪表板。 Grafana将作为部署安装。
node-exporter, a Prometheus exporter that runs on cluster nodes and provides OS and hardware metrics to Prometheus. Consult the node-exporter GitHub repo to learn more. node-exporter will be installed as a DaemonSet.
node-exporter，Prometheus导出器，在群集节点上运行，并向Prometheus提供操作系统和硬件指标。请参阅node-exporter GitHub存储库以了解更多信息。 node-exporter将作为DaemonSet安装。
kube-state-metrics, an add-on agent that listens to the Kubernetes API server and generates metrics about the state of Kubernetes objects like Deployments and Pods. You can learn more by consulting the kube-state-metrics GitHub repo. kube-state-metrics will be installed as a Deployment.
kube-state-metrics，一个侦听Kubernetes API服务器并生成有关Kubernetes对象(如Deployments和Pods)状态的度量的附加代理。您可以通过查询kube-state-metrics GitHub repo了解更多信息。 kube-state-metrics将作为部署安装。

By default, along with scraping metrics generated by node-exporter, kube-state-metrics, and the other components listed above, Prometheus will be configured to scrape metrics from the following components:

默认情况下，连同由node-exporter生成的抓取指标，kube-state-metrics和上面列出的其他组件一起，Prometheus将配置为从以下组件中抓取指标：

kube-apiserver, the Kubernetes API server.
kube-apiserver， Kubernetes API服务器。
CoreDNS, the Kubernetes cluster DNS server.
CoreDNS ，Kubernetes集群DNS服务器。
kubelet, the primary node agent that interacts with kube-apiserver to manage Pods and containers on a node.
kubelet ，与kube-apiserver交互以管理节点上的Pod和容器的主要节点代理。
cAdvisor, a node agent that discovers running containers and collects their CPU, memory, filesystem, and network usage metrics.
cAdvisor ，一种节点代理，可发现正在运行的容器并收集其CPU，内存，文件系统和网络使用情况的指标。

On your local machine, let’s begin by installing the prometheus-operator Helm chart and passing in the custom values file we created above:

在您的本地计算机上，让我们开始安装prometheus-operator Helm图表并传入上面创建的自定义值文件：

helm install --namespace monitoring --name doks-cluster-monitoring -f custom-values.yaml stable/prometheus-operator
头盔安装-名称空间监视-名称doks-cluster-monitoring -f custom-values.yaml稳定/ prometheus-operator

Here we run helm install and install all components into the monitoring namespace, which we create at the same time. This allows us to cleanly separate the monitoring stack from the rest of the Kubernetes cluster. We name the Helm release doks-cluster-monitoring and pass in the custom values file we created in Step 1. Finally, we specify that we’d like to install the prometheus-operator chart from the Helm stable directory.

在这里，我们运行helm install并将所有组件安装到我们同时创建的monitoring名称空间中。这使我们能够将监控堆栈与Kubernetes集群的其余部分完全分开。我们将Helm版本命名为doks-cluster-monitoring并传入在步骤1中创建的自定义值文件。最后，我们指定要从Helm stable 目录安装prometheus-operator图表。

You should see the following output:

您应该看到以下输出：


   
   
    
    Output
   
   NAME:   doks-cluster-monitoring
LAST DEPLOYED: Mon Apr 22 10:30:42 2019
NAMESPACE: monitoring
STATUS: DEPLOYED

RESOURCES:
==> v1/PersistentVolumeClaim
NAME                             STATUS   VOLUME            CAPACITY  ACCESS MODES  STORAGECLASS  AGE
doks-cluster-monitoring-grafana  Pending  do-block-storage  10s

==> v1/ServiceAccount
NAME                                              SECRETS  AGE
doks-cluster-monitoring-grafana                   1        10s
doks-cluster-monitoring-kube-state-metrics        1        10s

. . .

==> v1beta1/ClusterRoleBinding
NAME                                                  AGE
doks-cluster-monitoring-kube-state-metrics            9s
psp-doks-cluster-monitoring-prometheus-node-exporter  9s


NOTES:
The Prometheus Operator has been installed. Check its status by running:
  kubectl --namespace monitoring get pods -l "release=doks-cluster-monitoring"

Visit https://github.com/coreos/prometheus-operator for instructions on how
to create & configure Alertmanager and Prometheus instances using the Operator.

This indicates that Prometheus Operator, Prometheus, Grafana, and the other components listed above have successfully been installed into your DigitalOcean Kubernetes cluster.

这表明Prometheus Operator，Prometheus，Grafana和上面列出的其他组件已成功安装到DigitalOcean Kubernetes群集中。

Following the note in the helm install output, check the status of the release’s Pods using kubectl get pods:

按照helm install输出中的注释，使用kubectl get pods检查发行版kubectl get pods ：

kubectl --namespace monitoring get pods -l "release=doks-cluster-monitoring"
kubectl --namespace监视获取pods -l“ release = doks-cluster-monitoring”

You should see the following:

您应该看到以下内容：


   
   
    
    Output
   
   NAME                                                         READY   STATUS    RESTARTS   AGE
doks-cluster-monitoring-grafana-9d7f984c5-hxnw6              2/2     Running   0          3m36s
doks-cluster-monitoring-kube-state-metrics-dd8557f6b-9rl7j   1/1     Running   0          3m36s
doks-cluster-monitoring-pr-operator-9c5b76d78-9kj85          1/1     Running   0          3m36s
doks-cluster-monitoring-prometheus-node-exporter-2qvxw       1/1     Running   0          3m36s
doks-cluster-monitoring-prometheus-node-exporter-7brwv       1/1     Running   0          3m36s
doks-cluster-monitoring-prometheus-node-exporter-jhdgz       1/1     Running   0          3m36s

This indicates that all the monitoring components are up and running, and you can begin exploring Prometheus metrics using Grafana and its preconfigured dashboards.

这表明所有监视组件都已启动并正在运行，您可以开始使用Grafana及其预配置的仪表板探索Prometheus指标。

第3步-访问Grafana和探索指标数据 (Step 3 — Accessing Grafana and Exploring Metrics Data)

The prometheus-operator Helm chart exposes Grafana as a ClusterIP Service, which means that it’s only accessible via a cluster-internal IP address. To access Grafana outside of your Kubernetes cluster, you can either use kubectl patch to update the Service in place to a public-facing type like NodePort or LoadBalancer, or kubectl port-forward to forward a local port to a Grafana Pod port.

Prometheus prometheus-operator Helm图表将Grafana公开为ClusterIP服务，这意味着只能通过群集内部的IP地址访问它。要在Kubernetes集群之外访问Grafana，您可以使用kubectl patch将服务适当地更新为面向公众的类型(例如NodePort或LoadBalancer ，或者使用kubectl port-forward将本地端口kubectl port-forward到Grafana Pod端口。

In this tutorial we’ll forward ports, but to learn more about kubectl patch and Kubernetes Service types, you can consult Update API Objects in Place Using kubectl patch and Services from the official Kubernetes docs.

在本教程中，我们将转发端口，但是要了解有关kubectl patch和Kubernetes服务类型的更多信息，您可以从Kubernetes官方文档中查阅使用kubectl补丁和服务就地更新API对象。

Begin by listing running Services in the monitoring namespace:

首先在monitoring名称空间中列出正在运行的服务：

kubectl get svc -n monitoring
kubectl get svc -n监控

You should see the following Services:

您应该看到以下服务：


   
   
    
    Output
   
   NAME                                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
alertmanager-operated                              ClusterIP   None             <none>        9093/TCP,6783/TCP   34m
doks-cluster-monitoring-grafana                    ClusterIP   10.245.105.130   <none>        80/TCP              34m
doks-cluster-monitoring-kube-state-metrics         ClusterIP   10.245.140.151   <none>        8080/TCP            34m
doks-cluster-monitoring-pr-alertmanager            ClusterIP   10.245.197.254   <none>        9093/TCP            34m
doks-cluster-monitoring-pr-operator                ClusterIP   10.245.14.163    <none>        8080/TCP            34m
doks-cluster-monitoring-pr-prometheus              ClusterIP   10.245.201.173   <none>        9090/TCP            34m
doks-cluster-monitoring-prometheus-node-exporter   ClusterIP   10.245.72.218    <none>        30206/TCP           34m
prometheus-operated                                ClusterIP   None             <none>        9090/TCP            34m

We are going to forward local port 8000 to port 80 of the doks-cluster-monitoring-grafana Service, which will in turn forward to port 3000 of a running Grafana Pod. These Service and Pod ports are configured in the stable/grafana Helm chart values file:

我们将本地端口8000转发到doks-cluster-monitoring-grafana服务的端口80 ，该端口又将转发到运行中的Grafana Pod的端口3000 。这些服务和Pod端口在stable/grafana头盔图表值文件中配置：

kubectl port-forward -n monitoring svc/doks-cluster-monitoring-grafana 8000:80
kubectl端口转发-n监视svc / doks-cluster-monitoring-grafana 8000：80

You should see the following output:

您应该看到以下输出：


   
   
    
    Output
   
   Forwarding from 127.0.0.1:8000 -> 3000
Forwarding from [::1]:8000 -> 3000

This indicates that local port 8000 is being forwarded successfully to a Grafana Pod.

这表明本地端口8000已成功转发到Grafana Pod。

Visit http://localhost:8000 in your web browser. You should see the following Grafana login page:

在网络浏览器中访问http://localhost:8000 。您应该看到以下Grafana登录页面：

Enter admin as the username and the password you configured in custom-values.yaml. Then, hit Log In.

输入admin作为用户名和您在custom-values.yaml配置的密码。然后，点击登录。

You’ll be brought to the following Home Dashboard:

您将被带到以下主页仪表板 ：

In the left-hand navigation bar, select the Dashboards button, then click on Manage:

在左侧导航栏中，选择仪表板按钮，然后单击管理：

You’ll be brought to the following dashboard management interface, which lists the dashboards installed by the prometheus-operator Helm chart:

您将进入以下仪表板管理界面，该界面列出了由prometheus-operator Helm图表安装的仪表板：

These dashboards are generated by kubernetes-mixin, an open-source project that allows you to create a standardized set of cluster monitoring Grafana dashboards and Prometheus alerts. To learn more, consult the Kubernetes Mixin GitHub repo.

这些仪表板由kubernetes-mixin生成， kubernetes-mixin是一个开源项目，可让您创建一组标准化的集群监视Grafana仪表板和Prometheus警报。要了解更多信息，请查阅Kubernetes Mixin GitHub存储库。

Click in to the Kubernetes / Nodes dashboard, which visualizes CPU, memory, disk, and network usage for a given node:

单击进入Kubernetes / Nodes仪表板，该仪表板显示给定节点的CPU，内存，磁盘和网络使用情况：

Describing each dashboard and how to use it to visualize your cluster’s metrics data goes beyond the scope of this tutorial. To learn more about the USE method for analyzing a system’s performance, you can consult Brendan Gregg’s The Utilization Saturation and Errors (USE) Method page. Google’s SRE Book is another helpful resource, in particular Chapter 6: Monitoring Distributed Systems. To learn how to build your own Grafana dashboards, check out Grafana’s Getting Started page.

描述每个仪表板以及如何使用它来可视化集群的指标数据超出了本教程的范围。要了解有关用于分析系统性能的USE方法的更多信息，请查阅Brendan Gregg的“利用率饱和和错误(USE)方法”页面。 Google的SRE图书是另一个有用的资源，尤其是第6章：监视分布式系统。要了解如何构建自己的Grafana仪表板，请查看Grafana的“ 入门”页面。

In the next step, we’ll follow a similar process to connect to and explore the Prometheus monitoring system.

在下一步中，我们将遵循类似的过程来连接和探索Prometheus监视系统。

步骤4 —访问Prometheus和Alertmanager (Step 4 — Accessing Prometheus and Alertmanager)

To connect to the Prometheus Pods, we once again have to use kubectl port-forward to forward a local port. If you’re done exploring Grafana, you can close the port-forward tunnel by hitting CTRL-C. Alternatively you can open a new shell and port-forward connection.

要连接到Prometheus Pods，我们再次必须使用kubectl port-forward转发本地端口。如果您完成了Grafana的探索，则可以通过按CTRL-C来关闭端口转发隧道。或者，您可以打开一个新的外壳和端口转发连接。

Begin by listing running Services in the monitoring namespace:

首先在monitoring名称空间中列出正在运行的服务：

kubectl get svc -n monitoring
kubectl get svc -n监控

You should see the following Services:

您应该看到以下服务：


   
   
    
    Output
   
   NAME                                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
alertmanager-operated                              ClusterIP   None             <none>        9093/TCP,6783/TCP   34m
doks-cluster-monitoring-grafana                    ClusterIP   10.245.105.130   <none>        80/TCP              34m
doks-cluster-monitoring-kube-state-metrics         ClusterIP   10.245.140.151   <none>        8080/TCP            34m
doks-cluster-monitoring-pr-alertmanager            ClusterIP   10.245.197.254   <none>        9093/TCP            34m
doks-cluster-monitoring-pr-operator                ClusterIP   10.245.14.163    <none>        8080/TCP            34m
doks-cluster-monitoring-pr-prometheus              ClusterIP   10.245.201.173   <none>        9090/TCP            34m
doks-cluster-monitoring-prometheus-node-exporter   ClusterIP   10.245.72.218    <none>        30206/TCP           34m
prometheus-operated                                ClusterIP   None             <none>        9090/TCP            34m

We are going to forward local port 9090 to port 9090 of the doks-cluster-monitoring-pr-prometheus Service:

我们将本地端口9090转发到doks-cluster-monitoring-pr-prometheus服务的端口9090 ：

kubectl port-forward -n monitoring svc/doks-cluster-monitoring-pr-prometheus 9090:9090
kubectl端口转发-n监控svc / doks-cluster-monitoring-pr-prometheus 9090：9090

You should see the following output:

您应该看到以下输出：


   
   
    
    Output
   
   Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090

This indicates that local port 9090 is being forwarded successfully to a Prometheus Pod.

这表明本地端口9090已成功转发到Prometheus Pod。

Visit http://localhost:9090 in your web browser. You should see the following Prometheus Graph page:

在网络浏览器中访问http://localhost:9090 。您应该看到以下Prometheus Graph页面：

From here you can use PromQL, the Prometheus query language, to select and aggregate time series metrics stored in its database. To learn more about PromQL, consult Querying Prometheus from the official Prometheus docs.

在这里，您可以使用Prometheus查询语言PromQL选择并汇总存储在其数据库中的时间序列指标。要了解有关PromQL的更多信息，请查阅Prometheus官方文档中的Querying Prometheus。

In the Expression field, type machine_cpu_cores and hit Execute. You should see a list of time series with the metric machine_cpu_cores that reports the number of CPU cores on a given node. You can see which node generated the metric and which job scraped the metric in the metric labels.

在表达式字段中，输入machine_cpu_cores并点击Execute 。您应该看到带有度量machine_cpu_cores的时间序列列表，该序列报告给定节点上的CPU核心数量。您可以在度量标准标签中查看哪个节点生成了度量标准，以及哪个作业将度量标准刮取了。

Finally, in the top navigation bar, click on Status and then Targets to see the list of targets Prometheus has been configured to scrape. You should see a list of targets corresponding to the list of monitoring endpoints described at the beginning of Step 2.

最后，在顶部导航栏中，单击“ 状态” ，然后单击“ 目标”以查看Prometheus已配置为抓取的目标列表。您应该看到与步骤2开头所述的监视端点列表相对应的目标列表。

To learn more about Promtheus and how to query your cluster metrics, consult the official Prometheus docs.

要了解有关Promtheus以及如何查询群集指标的更多信息，请查阅Prometheus官方文档。

We’ll follow a similar process to connect to AlertManager, which manages Alerts generated by Prometheus. You can explore these Alerts by clicking into Alerts in the Prometheus top navigation bar.

我们将按照类似的过程连接到AlertManager，该Manager管理Prometheus生成的警报。您可以通过单击Prometheus顶部导航栏中的“ 警报”来浏览这些警报。

To connect to the Alertmanager Pods, we will once again use kubectl port-forward to forward a local port. If you’re done exploring Prometheus, you can close the port-forward tunnel by hitting CTRL-C. Alternatively you can open a new shell and port-forward connection.

要连接到Alertmanager Pod，我们将再次使用kubectl port-forward转发本地端口。如果您已完成Prometheus的探索，则可以通过按CTRL-C来关闭端口转发隧道。或者，您可以打开一个新的外壳和端口转发连接。

Begin by listing running Services in the monitoring namespace:

首先在monitoring名称空间中列出正在运行的服务：

kubectl get svc -n monitoring
kubectl get svc -n监控

You should see the following Services:

您应该看到以下服务：


   
   
    
    Output
   
   NAME                                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
alertmanager-operated                              ClusterIP   None             <none>        9093/TCP,6783/TCP   34m
doks-cluster-monitoring-grafana                    ClusterIP   10.245.105.130   <none>        80/TCP              34m
doks-cluster-monitoring-kube-state-metrics         ClusterIP   10.245.140.151   <none>        8080/TCP            34m
doks-cluster-monitoring-pr-alertmanager            ClusterIP   10.245.197.254   <none>        9093/TCP            34m
doks-cluster-monitoring-pr-operator                ClusterIP   10.245.14.163    <none>        8080/TCP            34m
doks-cluster-monitoring-pr-prometheus              ClusterIP   10.245.201.173   <none>        9090/TCP            34m
doks-cluster-monitoring-prometheus-node-exporter   ClusterIP   10.245.72.218    <none>        30206/TCP           34m
prometheus-operated                                ClusterIP   None             <none>        9090/TCP            34m

We are going to forward local port 9093 to port 9093 of the doks-cluster-monitoring-pr-alertmanager Service.

我们将把本地端口9093到doks-cluster-monitoring-pr-alertmanager服务的端口9093 。

kubectl port-forward -n monitoring svc/doks-cluster-monitoring-pr-alertmanager 9093:9093
kubectl端口转发-n监视svc / doks-cluster-monitoring-pr-alertmanager 9093：9093

You should see the following output:

您应该看到以下输出：


   
   
    
    Output
   
   Forwarding from 127.0.0.1:9093 -> 9093
Forwarding from [::1]:9093 -> 9093

This indicates that local port 9093 is being forwarded successfully to an Alertmanager Pod.

这表明本地端口9093已成功转发到Alertmanager Pod。

Visit http://localhost:9093 in your web browser. You should see the following Alertmanager Alerts page:

在网络浏览器中访问http://localhost:9093 。您应该看到以下Alertmanager 警报页面：

From here, you can explore firing alerts and optionally silencing them. To learn more about Alertmanager, consult the official Alertmanager documentation.

在这里，您可以浏览触发警报并选择使它们静音。要了解有关Alertmanager的更多信息，请查阅Alertmanager官方文档。

结论 (Conclusion)

In this tutorial, you installed a Prometheus, Grafana, and Alertmanager monitoring stack into your DigitalOcean Kubernetes cluster with a standard set of dashboards, Prometheus rules, and alerts. Since this was done using Helm, you can use helm upgrade, helm rollback, and helm delete to upgrade, roll back, or delete the monitoring stack. To learn more about these functions, consult How To Install Software on Kubernetes Clusters with the Helm Package Manager.

在本教程中，您已将Prometheus，Grafana和Alertmanager监视堆栈安装到具有标准仪表板，Prometheus规则和警报的DigitalOcean Kubernetes集群中。由于此操作是使用Helm完成的，因此可以使用helm upgrade ， helm rollback和helm delete来升级，回滚或删除监视堆栈。要了解有关这些功能的更多信息，请参阅如何使用Helm Package Manager在Kubernetes群集上安装软件。

The prometheus-operator chart helps you get cluster monitoring up and running quickly using Helm. You may wish to build, deploy, and configure Prometheus Operator manually. To do so, consult the Prometheus Operator and kube-prometheus GitHub repos.

prometheus-operator图表可帮助您使用Helm快速启动和运行集群监视。您可能希望手动构建，部署和配置Prometheus Operator。为此，请咨询Prometheus Operator和kube-prometheus GitHub存储库。

翻译自: https://www.digitalocean.com/community/tutorials/how-to-set-up-digitalocean-kubernetes-cluster-monitoring-with-helm-and-prometheus-operator

cukw6666

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
如何使用Helm和Prometheus运算符设置DigitalOcean Kubernetes集群监视

介绍 (Introduction)Along with tracing and logging, monitoring and alerting are essential components of a Kubernetes observability stack. Setting up monitoring for your Kubernetes cluster allows you to...
复制链接

扫一扫