敏捷开发度量_度量驱动开发简介：什么是度量，为什么要使用它们？

最新推荐文章于 2024-04-26 10:57:44 发布

cumian9828

最新推荐文章于 2024-04-26 10:57:44 发布

阅读量1.2k

点赞数

文章标签：数据库大数据 python java linux

原文链接：https://www.freecodecamp.org/news/metrics-driven-development/

版权

敏捷开发度量

One of the coolest things I have learned in the last year is how to constantly deliver value into production without causing too much chaos.

我在去年学到的最酷的事情之一是如何在不引起太多混乱的情况下不断地将价值传递到生产中。

In this post, I’ll explain the metrics-driven development approach and how it helped me to achieve that. By the end of the post, you’ll be able to answer the following questions:

在这篇文章中，我将解释指标驱动的开发方法以及它如何帮助我实现这一目标。在发布结束之前，您将能够回答以下问题：

What are metrics and why should I use them
什么是指标，为什么要使用它们
What are the different types of metrics
有哪些不同类型的指标
What tools could I use to store and display metrics
我可以使用哪些工具来存储和显示指标
What is a real-world example of metrics-driven development
指标驱动开发的真实示例是什么

什么是指标，为什么要使用它们？ (What are metrics and why should I use them?)

Metrics give you the ability to collect information on an actively running system without changing its code.

指标使您能够在运行中的系统上收集信息，而无需更改其代码。

It allows you to gain valuable data on the behavior of your application while it runs so you can make data-driven decisions based on real customer feedback and usage in production.

它使您可以在应用程序运行时获得有关其行为的有价值的数据，从而可以基于真实的客户反馈和生产中的使用情况来制定数据驱动的决策 。

我可以使用哪些指标？ (What are the types of metrics available to me?)

These are the most common metrics used today:

这些是当今最常用的指标：

Counter — Represents a monotonically increasing value.
计数器—表示单调递增的值。

In this example, a counter metric is used to calculate the rate of events over time, by counting events per second

在此示例中，计数器度量用于通过每秒计数事件来计算随时间变化的事件速率

Gauge — Represents a single value that can go up or down.
仪表—表示可以上升或下降的单个值。

In this example, a gauge metric is used to monitor the user CPU in percentages

在此示例中，量表指标用于监视百分比的用户CPU

Histogram — A counting of observations (like request durations or sizes) in configurable buckets.
直方图—可配置存储桶中的观察值(如请求持续时间或大小)计数。

In this example, a histogram metric is used to calculate the 75th and 90th percentiles of an HTTP request duration.

在此示例中，直方图度量用于计算HTTP请求持续时间的第75个百分位和第90个百分位。

The bits and bytes of the types: counter, histogram, and gauge can be quite confusing. Try reading about it further here.

计数器，直方图和量表的类型的位和字节可能会非常混乱。尝试在此处进一步阅读。

我可以使用哪些工具来存储和显示指标？ (What tools can I use to store and display metrics?)

Most monitoring systems consist of a few parts:

大多数监视系统由以下几部分组成：

Time-series database — A database software that optimizes storing and serving time-series data. Two examples of this kind of database are Whisper and Prometheus.
时间序列数据库—一种数据库软件，可优化存储和提供时间序列数据。这种数据库的两个示例是Whisper和Prometheus 。
Querying engine (with a querying language) — Two examples of common query engines are: Graphite and PromQL
查询引擎(使用查询语言)—常见查询引擎的两个示例是： Graphite和PromQL
Alerting system — The mechanism that allows you to configure alerts based on graphs created by the querying language. The system can send these alerts to Mail, Slack, PagerDuty. Two examples of common alerting systems are: Grafana and Prometheus.
警报系统-一种机制，允许您基于查询语言创建的图形来配置警报。系统可以将这些警报发送到Mail，Slack，PagerDuty。常见警报系统的两个示例是： Grafana和Prometheus 。
UI — Allows you to view the graphs generated by the incoming data and configure queries and alerts. Two examples of common UI systems are: Graphite and Grafana
UI-允许您查看传入数据生成的图形并配置查询和警报。常见UI系统的两个示例是：石墨和Grafana

The setup we are using today in BigPanda Engineering is

我们今天在BigPanda Engineering中使用的设置是

Telegraf — used as a StatsD server.
Telegraf-用作StatsD服务器。
Prometheus — used as our scrapping engine, Time-series database and querying engine.
Prometheus-用作我们的抓取引擎，时间序列数据库和查询引擎。
Grafana — used for Alerting, and UI
Grafana-用于警报和UI

And the constraints we had in mind while choosing this stack were:

在选择此堆栈时我们想到的约束是：

We want scalable and elastic metrics scraping
我们想要刮擦可伸缩的弹性指标
We want a performant query engine
我们需要高性能的查询引擎
We want the ability to query our metrics using custom tags(such as service names, hosts, etc.)
我们希望能够使用自定义标签(例如服务名称，主机等)查询指标

指标驱动的情感分析服务开发的真实示例 (A real-world example of Metrics-driven development of a Sentiment Analysis service)

Let’s develop a new pipeline service that calculates sentiments based on textual inputs and does it in a Metrics Driven Development way!

让我们开发一个新的管道服务，该服务基于文本输入来计算情感并以度量驱动的开发方式进行！

Let’s say I need to develop this pipeline service:

假设我需要开发此管道服务：

And this is my usual development process:

这是我通常的开发过程：

So I write the following implementation:

因此，我编写了以下实现：

let senService: SentimentAnalysisService = new SentimentAnalysisService();
while (true) {
    let tweetInformation = kafkaConsumer.consume()
    let deserializedTweet: { msg: string } = deSerialize(tweetInformation)
    let sentimentResult = senService.calculateSentiment(deserializedTweet.msg)
    let serializedSentimentResult = serialize(sentimentResult)
    sentimentStore.store(sentimentResult);
    kafkaProducer.produce(serializedSentimentResult, 'sentiment_topic', 0);
}

The full gist can be found here.

完整的要点可以在这里找到。

And this method works perfectly fine.

和T 他的方法工作完全正常 。

But what happens when it doesn’t?

但是，如果不这样做会发生什么呢 ？

The reality is that while working (in an agile development process) we make mistakes. That’s a fact of life.

现实情况是，在工作(在敏捷开发过程中)时，我们会犯错误。那是生活的事实。

I believe that the real challenge with making mistakes is not to avoid them, but rather to optimize how fast we detect and repair them. So, we need to gain the ability to quickly discover our mistakes.

我相信犯错误的真正挑战不是避免错误，而是优化我们发现和修复错误的速度。因此，我们需要获得快速发现错误的能力。

It's time for the MDD-way.

是时候采用MDD方式了。

指标驱动开发(MDD)方式 (The Metrics Driven Development (MDD) way)

The MDD approach is heavily inspired by the Three Commandments of Production (which I had learned about the hard way).

MDD方法在很大程度上受《生产三诫》 (我已从艰难的道路中学到)的启发。

The Three Commandments of Production are:

生产的三诫是：

There are mistakes and bugs in the code you write and deploy.
您编写和部署的代码中存在错误和错误。
The data flowing in production is unpredictable and unique!
生产中流动的数据是不可预测的且唯一的！
Perfect your code from real customer feedback and usage in production.
通过真实的客户反馈和生产中的使用来完善您的代码。

And since we now know the Commandments, it's time to go over the 4 step plan of the Metrics-Driven development process.

既然我们现在知道了《 诫命》 ，那么现在该回顾一下“度量驱动的开发”过程的4个步骤计划了。

成功实现MDD的4步计划 (The 4-step plan for a successful MDD)

开发代码 (Develop code )

I write the code, and whenever possible, wrap it with a feature flag that allows me to gradually open it for users.

我编写了代码，并在可能的情况下使用功能标记包装它，使我可以逐步为用户打开它。

指标 (Metrics )

This consists of two parts:

这包括两个部分：

Add metrics on relevant parts

在相关部分上添加指标

In this part, I ask myself what are the success or failure metrics I can define to make sure my feature works? In this case, does my new pipeline application perform its logic correctly?

在这一部分，我问自己可以定义哪些成功或失败指标以确保功能正常运行？在这种情况下，我的新管道应用程序是否正确执行其逻辑？

Add alerts on top of them so that I’ll be alerted when a bug occurs

在它们之上添加警报，以便在发生错误时向我发出警报

In this part, I ask myself What metric could alert me if I forgot something or did not implement it correctly?

在这一部分中，我问自己：如果我忘记了某些内容或未正确实施，什么指标可以提醒我？

部署方式 (Deployment)

I deploy the code and immediately monitor it to verify that it’s behaving as I have anticipated.

我部署了代码并立即对其进行监视，以验证其行为是否符合我的预期。

将此过程迭代到完美 (Iterate this process to perfection)

And that's it! Now that we have learned the process, let's tackle an important task inside it.

就是这样！既然我们已经了解了该过程，那么让我们解决其中的一项重要任务。

报告指标-我们应监控什么？ (Metrics to Report — what should we monitor?)

One of the toughest questions for me, when I’m doing MDD, is: “what should I monitor”?

在执行MDD时，对我来说最棘手的问题之一是：“我应该监视什么”？

In order to answer the question, lets try to zoom out and look at the big picture.All the possible information available to monitor can be divided into two parts:

为了回答这个问题，让我们尝试放大并查看大图。可用于监视的所有可能信息可以分为两部分：

Applicative information — Information that has an applicative context and meaning. An example of this will be — “How many tweets did we classify as positive in the last hour”?
适用信息 -具有适用上下文和含义的信息。例如：“在过去一个小时内，我们归类为正面的推文有多少条？”？
Operational information — Information that is related to the infrastructure that surrounds our application — Cloud data, CPU and disk utilization, network usage, etc.
运营信息 -与我们应用程序周围的基础结构相关的信息-云数据，CPU和磁盘利用率，网络利用率等。

Now, since we cannot monitor everything, we need to choose what applicative and operational information we want to monitor.

现在，由于我们无法监视所有内容，因此我们需要选择要监视的应用程序和操作信息。

The operational part really depends on your ops stack and has built-in solutions for (almost) all your monitoring needs.
操作部分确实取决于您的操作堆栈，并且具有(几乎)所有监视需求的内置解决方案。
The applicative part is more unique to your needs, and I'll try to explain how I think about it later in this post.
应用部分对于您的需求而言更加独特，我将在本文后面的部分中尝试解释我的想法。

After we do that, we can ask ourselves the question: what alerts do we want to set up on top of the metrics we just defined?

之后，我们可以问自己一个问题：我们要在刚刚定义的指标之上设置哪些警报？

The diagram (of information, metrics, alerts) can be drawn like this:

可以像这样绘制图表(信息，指标，警报)：

适用指标 (Applicative metrics)

I usually add applicative metrics out of two needs:

我通常会从两个需求中添加适用性指标：

回答问题 (To answer questions)

A question is something like, “When my service misbehaves, what information would be helpful to know about?”

问题是，“当我的服务行为不正常时，了解哪些信息会有所帮助？”

Some answers to that question can be — latencies of all IO calls, processing rate, throughput, etc…

该问题的一些答案可能是—所有IO调用的延迟，处理速率，吞吐量等…

Most of these questions will be helpful while you are searching for the answer. But once you found it, chances are you will not look at it again (since you already know the answer).

当您寻找答案时，这些问题中的大多数都会有所帮助。但是一旦找到它，您就不会再看它了(因为您已经知道答案了)。

These questions are usually driven by RND and are (usually) used to gather information internally.

这些问题通常由RND驱动，并且(通常)用于内部收集信息。

添加警报 (To add Alerts)

This may sound backward, but I usually add applicative metrics in order to define alerts on top of them. Meaning, we define the list of alerts and then deduce from them what are the applicative metrics to report.

这听起来可能有些倒退，但是我通常会添加适用性指标，以便在它们之上定义警报。意思是，我们定义警报列表，然后从中推断出要报告的适用指标。

These alerts are derived from the SLA of the product and are usually treated with mission-critical importance.

这些警报源自产品的SLA，通常将其视为至关重要的任务。

常见的警报类型 (Common types of alerts)

Alerts can be broken down into three parts:

警报可以分为三个部分：

SLA警报 (SLA Alerts)

SLA alerts surround the places in our system where an SLA is specified to meet explicit customer or internal requirements (i.e availability, throughput, latency, etc.). SLA breaches involve paging RND and waking people up, so try to keep the alerts in this list to a minimum.

SLA警报围绕我们系统中指定SLA以满足明确的客户或内部要求(即可用性，吞吐量，延迟等)的位置。违反SLA涉及寻呼RND并唤醒人们，因此，请尽量减少此列表中的警报。

Also, we can define Degradation Alerts in addition to SLA Alerts.Degradation alerts are defined with lower thresholds then SLA alerts, and are therefore useful in reducing the amount of SLA breaches — by giving you a proper heads-up before they happen.

此外，除了SLA警报外，我们还可以定义降级警报。降级警报的定义阈值比SLA警报低，因此对于减少SLA违规数量很有用-通过在事件发生之前给您适当的提示。

An example of an SLA alert would be, “All sentiment requests must finish in under 500ms.”

SLA警报的示例是：“所有情感请求必须在500毫秒内完成。”

An example of a Degradation Alert will be: “All sentiment requests must finish in under 400ms”.

降级警报的示例为：“所有情感请求必须在400毫秒内完成”。

These are the alerts I defined:

这些是我定义的警报：

Latency — I expect the 90th percentile of a single request duration not to exceed 300ms.
延迟-我希望单个请求持续时间的90％不超过300毫秒。
Success/Failure ratio of requests — I expect the number of failures per second, success per second, to remain under 0.01.
请求的成功/失败率-我希望每秒的失败数(每秒的成功)保持在0.01以下。
Throughput — I expect that the number of operations per second (ops) that the application handles will be > 200
吞吐量—我希望应用程序每秒处理的操作数(ops)> 200
Data Size — I expect the amount of data that we store in a single day should not exceed 2GB.
数据大小-我希望我们一天存储的数据量不应超过2GB。

200 ops * 60 bytes(Size of Sentiment Result)* 86400 sec in a day = 1GB < 2GB

200操作* 60字节(情感结果大小)*一天86400秒= 1GB <2GB

基准违规警报 (Baseline Breaching Alerts)

These alerts usually involve measuring and defining a baseline and making sure it doesn’t (dramatically) change over time with alerts.

这些警报通常涉及测量和定义基线，并确保基线不会(剧烈地)随时间变化。

For example, the 99th processing latency for an event must stay relatively the same across time unless we have made dramatic changes to the logic.

例如，除非我们对逻辑进行了重大更改，否则事件的第99个处理延迟必须在整个时间内保持相对不变。

These are the alerts I defined:

这些是我定义的警报：

Amount of Positive or Neutral or Negative Sentiment tweets — If for whatever reason, the sum of Positive tweets has increased or decreased dramatically, I might have a bug somewhere in my application.
正面，中性或负面情绪推文的数量-如果出于某种原因，正面推文的总数急剧增加或减少，则我的应用程序中可能存在错误。
All latency \ Success ratio of requests \ Throughput \ Data size must not increase\decrease dramatically over time.
所有延迟\请求成功率\吞吐量\数据大小不得随时间急剧增加\减少。

运行时属性警报 (Runtime Properties Alerts)

I’ve given a talk about Property-Based Tests and their insane strength. As it turns out, collecting metrics allows us to run property-based tests on our system in production!

我已经讲了基于属性的测试及其疯狂的强度。事实证明，收集指标使我们能够在生产中的系统上运行基于属性的测试！

Some properties of our system:

我们系统的一些属性：

Since we consume messages from a Kafka topic, the handled offset must monotonically increase over time.
由于我们使用来自Kafka主题的消息，因此处理的偏移量必须随时间单调增加。
1 ≥ sentiment score ≥ 0
1≥情绪评分≥0
A tweet should classify as either Negative \ Positive \ Neutral.
一条推文应分类为“消极”，“积极”，“中立”。
A tweet classification must be unique.
鸣叫分类必须唯一。

These alerts helped me validate that:

这些警报帮助我验证了：

We are reading with the same group-id. Changing consumer group ids by mistake in deployment is a common mistake when using Kafka. It causes a lot of mayhem in production.
我们正在使用相同的组ID进行阅读。使用Kafka时，在部署中错误地更改使用者组ID是常见的错误。它在生产中引起很多混乱。
The sentiment score is consistently between 0 and 1.
情绪得分始终在0到1之间。
Tweet category length should always be 1.
推文类别长度应始终为1。

In order to define these alerts, you need to submit metrics from your application. Go here for the complete metrics list.

为了定义这些警报，您需要从您的应用程序提交指标。转到此处以获取完整的指标列表。

Using these metrics, I can create alerts that will “page” me whenever one of these properties do not hold anymore in production.

使用这些指标，我可以创建警报，这些警报中的任何一个在生产中不再可用时将“寻呼”我。

Let’s take a look at a possible implementation of all these metrics

让我们看一下所有这些指标的可能实现

import SDC = require("statsd-client");
let sdc = new SDC({ host: 'localhost' });
let senService: SentimentAnalysisService; //...
while (true) {
    let tweetInformation = kafkaConsumer.consume()
    sdc.increment('incoming_requests_count')
    let deserializedTweet: { msg: string } = deSerialize(tweetInformation)
    sdc.histogram('request_size_chars', deserializedTweet.msg.length);
    let sentimentResult = senService.calculateSentiment(deserializedTweet.msg)
    if (sentimentResult !== undefined) {
        let serializedSentimentResult = serialize(sentimentResult)
        sdc.histogram('outgoing_event_size_chars', serializedSentimentResult.length);
        sentimentStore.store(sentimentResult)
        kafkaProducer.produce(serializedSentimentResult, 'sentiment_topic', 0);
    }

}

The full code can be found here

完整的代码可以在这里找到

A few thoughts on the code example above:

对以上代码示例的几点思考：

There has been a staggering amount of metrics added to this codebase.
已向此代码库中添加了惊人数量的指标。
Metrics add complexity to the codebase, so, like all good things, add them responsibly and in moderation.
度量标准会增加代码库的复杂性，因此，像所有其他优点一样，度量标准应负责任地添加。
Choosing correct metric names is hard. Take your time selecting proper names. Here’s an excellent post about this.
选择正确的度量标准名称很困难。花点时间选择专有名称。这是一篇很棒的帖子。
You still need to collect these metrics and display them in a monitoring system (like Grafana), plus add alerts on top of them, but that’s a topic for a different post.
您仍然需要收集这些指标并将其显示在监视系统(例如Grafana)中，并在它们之上添加警报，但这是另一篇文章的主题。

我们是否达到了确定问题并更快解决这些问题的最初目标？ (Did we reach the initial goal of identifying issues and resolving them faster?)

We can now make sure the application latency and throughput do not degrade over time. Also, adding alerts on these metrics allows for a much faster issue discovery and resolution.

现在，我们可以确保应用程序延迟和吞吐量不会随时间降低。此外，在这些指标上添加警报可以更快地发现和解决问题。

结论 (Conclusion)

Metrics-driven development goes hand in hand with CI\CD, DevOps, and agile development process. If you are using any of the above keywords, then you are in the right place.

指标驱动的开发与CI \ CD，DevOps和敏捷开发流程紧密结合。如果您使用上述任何关键字，那么您来对地方了。

When done right, metrics make you feel more confident in your deployment in the same way that seeing passing unit-tests in your build makes you feel confident in the code you write.

正确完成操作后，度量标准就可以使您对部署更有信心，就像看到构建中通过的单元测试使您对编写的代码充满信心一样。

Adding metrics allows you to deploy code and feel confident that your production environment is stable and that your application is behaving as expected over time. So I encourage you to try it out!

添加指标使您可以部署代码，并确信自己的生产环境稳定，并且您的应用程序会随着时间推移按预期方式运行。因此，我鼓励您尝试一下！

一些参考 (Some references)

Here is a link to the code shown in this post, and here is the full metrics list described.
这里是一个链接到这个帖子中显示的代码，并在这里描述的完全度量列表。
If you are eager to try writing some metrics and to connect them to a monitoring system, check out Prometheus, Grafana and possibly this post
如果您渴望尝试编写一些度量标准并将其连接到监视系统，请查看Prometheus ， Grafana以及可能的这篇文章
This guy wrote a delightful post about metrics-driven development. GO read it.
这个家伙写了一篇有关度量驱动开发的令人愉快的文章。去读吧。