api监控_前段时间，我研究了如何使用服务监控API来创建针对

最新推荐文章于 2024-09-27 00:45:01 发布

weixin_26705651

最新推荐文章于 2024-09-27 00:45:01 发布

阅读量80

点赞数

原文链接：https://medium.com/google-cloud/some-time-ago-i-looked-at-using-the-service-monitoring-api-to-create-basic-slos-against-out-of-b3555b08036f

版权

api监控

Some time ago, I looked at using the Service Monitoring API to create basic SLOs against “out of the box” services like App Engine. This functionality has seen a lot of updates since then, and there’s now Terraform support for creating custom services and SLOs. I wanted to have a go at this myself to see how it works.

前段时间，我研究了如何使用服务监控API针对诸如App Engine之类的“开箱即用”服务创建基本SLO。从那时起，此功能进行了许多更新，并且现在Terraform支持创建自定义服务和SLO 。我本人想去看看它是如何工作的。

创建服务 (Creating the service)

SLO Monitoring does a great job of identifying services for you if you’re using things like Istio, App Engine, or Cloud Endpoints. But what if your service is on GCE, for example? In this case, you need to define it as a custom service, which will then allow you to define SLOs against it.

如果您正在使用Istio，App Engine或Cloud Endpoints之类的东西，SLO Monitoring会为您确定服务的出色工作。但是，例如，如果您的服务基于GCE，该怎么办？在这种情况下，您需要将其定义为自定义服务，然后将允许您针对它定义SLO。

Here’s how to define a “monitoring service” in Terraform:

以下是在Terraform中定义“ 监视服务 ”的方法：

The service definition is actually very simple — you just provide a service ID unique to your project and a display name. Once you run “terraform apply”, the service is then visible in the Console:

服务定义实际上非常简单-您只需提供项目唯一的服务ID和显示名称。一旦运行“ terraform apply”，该服务便会在控制台中可见：

From there, you can use the UI to create an SLO against it:

从那里，您可以使用UI针对它创建一个SLO：

Note that you have to use “Other” as the metric — custom services don’t have an “out of the box” understanding of availability and latency. So, you need to have a good SLI for your service. You could use something like a log-based metric, a metric emitted by the Google Cloud Load Balancer if you’re using that, or a custom metric being written by the service. Let’s take a look at defining an SLO using the latter.

请注意 ，您必须使用“其他”作为度量标准-自定义服务对可用性和延迟没有“开箱即用”的了解。因此，您需要为服务提供良好的SLI。您可以使用基于日志的指标，Google Cloud Load Balancer发出的指标(如果正在使用)或服务编写的自定义指标。让我们看一下使用后者定义SLO。

定义SLO (Defining the SLO)

Here’s how to define an SLO using Terraform:

以下是使用Terraform定义SLO的方法：

There are 3 main things to consider here:

这里有3个主要注意事项：

The basics — the resource ID, the SLO ID, the service you’re defining the SLO against, and the SLO display name.
基础知识 -资源ID，SLO ID，要为其定义SLO的服务以及SLO显示名称。
The SLI — are you going to be using a request- or windows-based SLI? If request-based — how will you count total requests and differentiate between good and bad requests?
SLI —您将要使用基于请求或Windows的SLI吗？如果基于请求-您将如何计算请求总数，并区分好请求和坏请求？

In my example, I’m using a service that’s been instrumented to emit two separate metrics — one to count all requests and another to count errors. This makes things quite simple.

在我的示例中，我使用的服务已发出了两个单独的指标，一个用于计算所有请求，另一个用于计算错误。这使事情变得很简单。

The goal — what’s the actual target for your SLO?
目标 -您的SLO的实际目标是什么？

In this example, the goal is that 99% of requests are successful over a rolling 1-day period.

在此示例中，目标是在1天的滚动时间内成功完成99％的请求。

创建和验证 (Creating and validating)

At this point, run terraform plan to make sure everything is correct:

此时，运行terraform plan以确保一切正确：

If everything looks correct, run terraform apply to create the service and the SLO(s):

如果一切看起来正确，请运行terraform apply创建服务和SLO：

Note that my file has two SLOs — a request-based one for availability and a windows-based one for latency. That’s why 3 resources are being created.

请注意 ，我的文件有两个SLO-一个基于请求的SLO(用于可用性)和一个基于Windows的SLO(用于延迟)。这就是为什么要创建3个资源的原因。

At this point, you can go back to the console and check your new service:

此时，您可以返回控制台并检查新服务：

I’ve clearly not set my availability target correctly (or my service is having some serious issues) — I should absolutely revisit this before I take the next step to set up an error budget burn alert on this.

我显然没有正确设置可用性目标(或我的服务遇到一些严重问题)-在下一步进行此操作之前，我应该完全重新考虑此问题，以为此设置错误预算刻录警报。

摘要 (Summary)

I’m really excited to see service and SLO support come to Terraform, and I hope lots of folks will take advantage of this to extend their automation capabilities. At this point, all of the major monitoring primitives can be created automatically once a project is up — this is great news! Thanks for reading, and let me know what you think!

我很高兴看到Terraform提供服务和SLO支持，并且我希望很多人可以利用这一点来扩展其自动化功能。此时，一旦项目启动，所有主要的监视原语都可以自动创建-这是个好消息！感谢您的阅读，让我知道您的想法！