负载测试压力测试区别_关于意外细微差别的可靠负载测试

最新推荐文章于 2024-05-14 20:52:36 发布

cullen2012

最新推荐文章于 2024-05-14 20:52:36 发布

阅读量1.6k

点赞数

文章标签：数据库大数据编程语言 python 人工智能

原文链接：https://habr.com/en/company/miro/blog/499782/

版权

负载测试压力测试区别

The irony is that simultaneously with the launch of the test, we reached the limits on the production server, resulting in two-hour service downtime. This further encouraged us to move from making occasional tests to establishing an effective load testing infrastructure. By infrastructure, I mean all tools for working with load testing: tools for launching the test (manual and automatic), the cluster that creates the load, a production-like cluster, metrics and reporting services, scaling services, and the code to manage it all.

具有讽刺意味的是，在测试启动的同时，我们达到了生产服务器的极限，导致两个小时的服务停机时间。这进一步鼓励我们从偶尔进行测试过渡到建立有效的负载测试基础结构。对于基础架构，我指的是用于进行负载测试的所有工具：用于启动测试的工具(手动和自动)，创建负载的集群，类似于生产的集群，指标和报告服务，扩展服务以及用于管理的代码这一切。

Simplified, this is what our structure looks like: a collection of different servers that somehow interact with each other, each server performing specific tasks. It seemed that to build the load testing infrastructure, it was enough for us to make this diagram, take account of all interactions, and start creating test cases for each block one by one.

简化后，这就是我们的结构：一组以某种方式相互交互的不同服务器的集合，每个服务器执行特定的任务。看起来，构建负载测试基础结构足以让我们制作此图，考虑所有交互并开始为每个模块逐个创建测试用例。

This approach is right, but it would have taken many months, which was not suitable for us because of our rapid growth — over the past twelve months,

这种方法是正确的，但是要花很多个月的时间，由于我们的快速发展，它不适合我们-在过去的12个月中，

我们的在线活跃用户从12,000个增加到100,000个 (we have grown from 12,000 to 100,000 simultaneously active online users)

. Also, we didn’t know how our service infrastructure would respond to the increased load: which blocks would become the bottleneck, and which would scale linearly?

。此外，我们也不知道我们的服务基础架构将如何应对不断增加的负载：哪些块将成为瓶颈，哪些块将线性扩展？

In the end, we decided to test the service with virtual users simulating real activity — that is, to build a clone of the production environment and make a big test that will:

最后，我们决定使用模拟真实活动的虚拟用户来测试服务，即构建生产环境的克隆并进行一个大型测试，该测试将：

create a load on a cluster that is structurally identical to the production cluster but which surpasses it in terms of performance;
在结构上与生产集群相同但在性能方面超过它的集群上创建负载；
give us all the data for making decisions;
向我们提供决策所需的所有数据；
show that the entire infrastructure is capable of withstanding the necessary load;
证明整个基础架构能够承受必要的负载；
become the basis for load tests that we may need in the future.
成为我们将来可能需要的负载测试的基础。
The only disadvantage of such a test is its cost because it will require an environment more extensive than the production environment.
这种测试的唯一缺点是它的成本，因为它将需要比生产环境更广泛的环境。

In this article, I will talk about creating a realistic test scenario, plugins (WS, Stress-client, Taurus), a load-generating cluster, and a production cluster, and I will show examples of using tests. In the next article, I will describe how we manage hundreds of load-generating servers.

在本文中，我将讨论创建实际的测试方案，插件(WS，Stress-client，Taurus)，负载生成集群和生产集群，并展示使用测试的示例。在下一篇文章中，我将描述我们如何管理数百个负载生成服务器。

创建现实的测试方案 (Creating a realistic test scenario)

To create a realistic test scenario, we need to:

要创建现实的测试方案，我们需要：

analyze users’ activity in the production environment, and to do this, identify essential metrics, start collecting them, and analyze the peaks;
分析用户在生产环境中的活动，并为此确定基本指标，开始收集指标并分析峰值；
create convenient, customizable blocks that we can use to effectively create a load on the right part of the business logic;
创建方便的，可定制的块，我们可以用来在业务逻辑的正确部分上有效地创建负载；
verify the realism of the test scenario by using server metrics.
通过使用服务器指标来验证测试方案的真实性。

Now, more details about each item.

现在，有关每个项目的更多详细信息。

分析生产环境中的用户活动 (Analyzing users’ activity in the production environment)

In our service, users can create whiteboards and work on them with different content: photos, texts, mockups, stickers, diagrams, etc. The first metric we need to collect is the number of whiteboards and the distribution of content on them.

在我们的服务中，用户可以创建白板并在其上处理不同的内容：照片，文本，模型，贴纸，图表等。我们需要收集的第一个指标是白板的数量及其上内容的分布。

On the same whiteboard at the same time, some users can be actively doing something — creating, editing, deleting the content — and some can simply be viewing the content. The ratio of users changing the content on the whiteboard to the total number of users of that whiteboard is also an important metric. We can derive this data from database usage statistics.

同时在同一白板上，一些用户可以积极地做某事-创建，编辑，删除内容-有些则可以简单地查看内容。更改白板内容的用户与该白板用户总数的比率也是一项重要指标。我们可以从数据库使用情况统计信息中得出此数据。

In our backend, we use a component approach. We call the components “models.” We break our code into models so that each model is responsible for a specific part of the business logic. We can count the number of database calls that occur through each model and identify the part of the logic that creates the heaviest load.

在后端，我们使用组件方法。我们称这些组件为“模型”。我们将代码分成模型，以便每个模型负责业务逻辑的特定部分。我们可以计算通过每个模型发生的数据库调用的数量，并确定造成最大负载的逻辑部分。

方便的，可定制的块 (Convenient, customizable blocks)

For example, we need to add a block to the test scenario that will create a load on our service identical to the load that happens when you open a dashboard page with a list of user whiteboards. When this page loads, HTTP requests containing a large amount of data are sent: the number of whiteboards, accounts to which the user has access, all users of the account, and so on.

例如，我们需要在测试场景中添加一个块，以在我们的服务上创建负载，该负载与您打开包含用户白板列表的仪表盘页面时发生的负载相同。加载此页面时，将发送包含大量数据的HTTP请求：白板数，用户有权访问的帐户，该帐户的所有用户，等等。

How to create an efficient load on a dashboard? When analyzing the behavior of the production environment, we saw database usage spikes during the opening of the dashboard of a large account. We can create an identical account and change the intensity of its data usage in the test case, effectively loading the dashboard with a small number of calls. We can also create an uneven load for better realism.

如何在仪表板上创建有效负载？在分析生产环境的行为时，我们看到在打开大型帐户的仪表板期间数据库使用量激增。我们可以创建一个相同的帐户，并在测试用例中更改其数据使用强度，从而以少量的呼叫有效地加载仪表板。我们还可以创建不均匀的负载以获得更好的真实感。

At the same time, it is crucial for us that the number of virtual users and the load generated by them are as similar as possible to the users and the load in the production environment. For this, we also recreate the background load on an average dashboard in the test. Thus, most virtual users work on small average dashboards, and only a few users create a destructive load, as happens in the production environment.

同时，对我们而言至关重要的是，虚拟用户的数量和虚拟用户所产生的负载应尽可能类似于生产环境中的用户和负载。为此，我们还要在测试中的平均仪表板上重新创建后台负载。因此，大多数虚拟用户都在较小的平均仪表板上工作，而只有少数用户会像生产环境中那样产生破坏性的负载。

From the start, we did not want to cover each server role and each interaction with a different test case. This can be seen in the dashboard example — during the test, we simply repeat what happens when a user opens a dashboard in the production environment. We do not cover what it affects with synthetic test cases. This allows us to cover nuances that we didn’t even expect with the test by default. Thus, we approach the creation of an infrastructure test from the side of business logic.

从一开始，我们就不想用不同的测试用例介绍每个服务器角色和每个交互。这可以在仪表板示例中看到–在测试期间，我们仅重复用户在生产环境中打开仪表板时发生的情况。我们不涵盖它对综合测试用例的影响。默认情况下，这使我们能够涵盖我们什至没有想到的细微差别。因此，我们从业务逻辑的角度着手创建基础结构测试。

This is the logic we used to efficiently load all other blocks of the service. At the same time, each block may not be realistic from functional logic; the important part is that it provides a realistic load according to server metrics. Then, using these blocks, we can create a test scenario that simulates real users’ activity.

这是我们用来有效加载服务的所有其他块的逻辑。同时，从功能逻辑上讲，每个方框可能都不现实。重要的部分是它根据服务器指标提供了实际的负载。然后，使用这些模块，我们可以创建一个模拟真实用户活动的测试场景。

数据是测试用例的一部分 (Data is part of the test case)

It is important to keep in mind that data is also part of the test case, and the code logic itself heavily depends on it. When building an extensive database for the test — and it obviously should be large for a large infrastructure test — we need to learn how to create data that will not distort the test during the execution. If we put bad data into the database, the test scenario can become unrealistic, and in an extensive database, that would be hard to fix. Therefore, we started creating data in the same way our users do use the REST API.

重要的是要记住，数据也是测试用例的一部分，而代码逻辑本身就严重依赖它。在为测试构建一个广泛的数据库时-对于大型基础架构测试来说显然它应该很大-我们需要学习如何创建在执行过程中不会扭曲测试的数据。如果我们将不良数据放入数据库中，则测试场景可能变得不切实际，并且在庞大的数据库中，这将很难修复。因此，我们开始以与用户使用REST API相同的方式创建数据。

For example, to create whiteboards with existing data, we make API requests that load the whiteboard from the backup. As a result, we get genuine real data — different whiteboards of different sizes. At the same time, the database is being filled relatively quickly because our script is making requests in multiple threads. In terms of speed, this is comparable to the generation of garbage data.

例如，要使用现有数据创建白板，我们发出API请求以从备份加载白板。结果，我们获得了真实的真实数据-不同大小的不同白板。同时，由于我们的脚本在多个线程中发出请求，因此数据库的填充速度相对较快。在速度方面，这可与垃圾数据的生成相媲美。

本部分小结 (Summary of this part)

Use realistic cases if you want to test everything at once;
如果要一次测试所有内容，请使用实际案例；
Analyze real user behavior to design the structure of the test cases;
分析实际用户行为以设计测试用例的结构；
From the very start, create convenient blocks to customize the testing process;
从一开始就创建方便的块以自定义测试过程；
Configure the tests according to the real server metrics, not the usage analytics;
根据真实服务器指标(而非使用情况分析)配置测试；
Do not forget that data is part of the test case.
不要忘记数据是测试用例的一部分。

负载生成集群 (Load-generating cluster)

Our load-generating tooling 我们的负载生成工具

In JMeter, we create a test that we launch using Taurus to create a load on various servers: web, API, and board servers. We perform database tests separately using PostgreSQL, not JMeter, so the diagram shows a dashed line.

在JMeter中，我们创建一个测试，然后使用Taurus启动测试，以在各种服务器(Web，API和板服务器)上创建负载。我们使用PostgreSQL而不是JMeter分别执行数据库测试，因此该图显示了虚线。

WebSocket内部的自定义工作 (Custom work inside a WebSocket)

Work on the whiteboard takes place inside a WebSocket connection, and it is on the whiteboard that multiuser work is possible. Nowadays, JMeter Plugins Manager has several plugins for working with WebSocket connections. The logic is the same everywhere — the plugin simply opens a WebSocket connection, and every action that occurs inside it, in any case, you have to write yourself. Why is that? Because it is impossible to work with WebSocket connections similarly to HTTP requests, that is, we cannot create a test case, extract dynamic values, and pass them along.

白板上的工作在WebSocket连接内部进行，并且可以在白板上进行多用户工作。如今，JMeter插件管理器有几个用于处理WebSocket连接的插件。逻辑无处不在-插件只是打开一个WebSocket连接，在其中发生的每个动作(无论如何)都必须自己编写。这是为什么？因为不可能像HTTP请求那样使用WebSocket连接，也就是说，我们无法创建测试用例，提取动态值并传递它们。

The work inside a WebSocket connection is usually very customized: you invoke specific custom methods passing specific custom data, and therefore, you need your means of understanding whether the request was executed correctly and how long it took to execute. You also have to write the Listeners inside that plugin yourself — we haven’t found an excellent ready-made solution.

WebSocket连接中的工作通常是高度定制的：您调用传递特定定制数据的特定定制方法，因此，您需要了解是否正确执行请求以及执行该请求需要花费多长时间的方法。您还必须自己在该插件中编写侦听器-我们还没有找到一个很好的现成解决方案。

压力客户 (Stress-client)

We want to make it as easy as possible to replicate what real users do. But we do not yet know how to record and replay everything that is happening in WebSocket inside the browser. If we recreate everything inside WebSocket from scratch, then we would get a new client, not the one that is used by real users. There is no incentive to write a new client if we already have one working.

我们希望尽可能简单地复制真实用户的行为。但是我们还不知道如何记录和重播浏览器内部WebSocket中发生的一切。如果我们从头开始在WebSocket中重新创建所有内容，那么我们将获得一个新的客户端，而不是实际用户使用的客户端。如果我们已经有一位新客户，则没有动力去写一位新客户。

So we decided to put our client inside JMeter. And we faced several difficulties. For example, JavaScript execution inside JMeter is a whole other topic, because it uses a very specific version of the language. And if you want to use your existing client code, you probably won’t be able to do that, because modern JavaScript features are not supported, and you’ll have to rewrite parts of your code.

因此，我们决定将客户放入JMeter。而且我们面临一些困难。例如，JMeter内部JavaScript执行是另一个主题，因为它使用了非常特定的语言版本。而且，如果您想使用现有的客户端代码，则可能将无法执行此操作，因为不支持现代JavaScript功能，并且您将不得不重写部分代码。

The second difficulty is that we don’t want to support the entire client code for load tests. Therefore, we removed everything from the client but the client-server interaction. This allowed us to use client-server methods and do everything that our client can do. The advantage of this is that client-server interaction very rarely changes, which means that the code support inside the test case is seldom required. For instance, over the last six months, I haven’t made any changes to the code because it works perfectly.

第二个困难是我们不想为负载测试支持整个客户端代码。因此，我们从客户端中删除了所有内容，但没有删除客户端与服务器之间的交互。这使我们能够使用客户端-服务器方法并执行客户端可以做的所有事情。这样做的好处是客户端与服务器之间的交互很少更改，这意味着几乎不需要测试用例中的代码支持。例如，在过去的六个月中，我没有对代码进行任何更改，因为它可以完美运行。

The third difficulty is that the introduction of large scripts significantly complicates the test case. First, it can become a bottleneck in the test. Second, we most likely will not be able to run a large number of threads from one machine. Right now, we are only able to run 730 threads.

第三个困难是大脚本的引入使测试案例大大复杂化。首先，它可能成为测试的瓶颈。其次，我们很可能将无法在一台计算机上运行大量线程。目前，我们只能运行730个线程。

我们在亚马逊实例上的例子 (Our example at the Amazon instance)

JMeter server type in AWS: m5.large ($0.06 per hour)
vCPU: 2
Mem (GiB): 8
Dedicated EBS Bandwidth (Mbps): Up to 3,500
Network Performance (Gbps): Up to 10
→ ~730 threads

在哪里获得数百台服务器以及如何省钱 (Where to get hundreds of servers and how to save money)

Next, the question arises: 730 threads from one machine, but we want 50,000 — where can we get so many servers? We are creating a cloud-based solution, so buying servers to test a cloud solution seems odd. Plus, there is always a certain slowness in buying new hardware. Therefore, we need to deploy them in the cloud as well, so we ended up choosing between cloud providers and cloud-based load testing tools.

接下来，问题来了：一台机器上有730个线程，但我们需要50,000个线程-在哪里可以得到这么多服务器？我们正在创建一个基于云的解决方案，因此购买服务器以测试云解决方案似乎很奇怪。另外，购买新硬件总是有一定的缓慢。因此，我们也需要将它们部署在云中，因此我们最终在云提供商和基于云的负载测试工具之间进行选择。

We decided not to use cloud-based load testing tools like Blazemeter and RedLine13 because their usage restrictions did not suit us. We have different test sites, so we wanted to find a universal solution that would allow us to reuse 90 percent of the work in local testing as well.

我们决定不使用Blazemeter和RedLine13等基于云的负载测试工具，因为它们的使用限制不适合我们。我们有不同的测试站点，因此我们想找到一个通用解决方案，该解决方案将使我们也可以在本地测试中重复使用90％的工作。

Hence, in the end, we were choosing between cloud service providers.

因此，最后，我们在云服务提供商之间进行选择。

Our platform is AWS-based, and almost all testing is done there, so we want the test bench to be as similar as possible to the production environment. Amazon has a lot of paid features, some of which, like load balancers, we use in production. If you don’t need these features in AWS, you can get them 17 times cheaper in Hetzner. Or you can get servers at Hetzner, use OpenStack, and write the balancers and other features yourself since with OpenStack, you can replicate the entire infrastructure. We managed to do just that.

我们的平台基于AWS，并且几乎所有测试都在此完成，因此我们希望测试平台与生产环境尽可能相似。亚马逊有很多付费功能，其中一些功能，例如我们在生产中使用的负载均衡器。如果您在AWS中不需要这些功能，则可以在Hetzner中将它们便宜17倍。或者，您可以在Hetzner上获得服务器，使用OpenStack并自己编写平衡器和其他功能，因为使用OpenStack，您可以复制整个基础架构。我们设法做到了。

Testing of 50,000 users using 69 AWS instances costs us about $3,000 per month. How do we save money? One way is to use temporary AWS instances — Spot instances. Their main benefit is that instead of keeping instances running all the time, we only launch them for tests, and they cost much less. One important detail, however, is that somebody can outbid your offer right at the time of testing. Fortunately, this has never happened to us, and we already save at least 60 percent of the cost thanks to them.

使用69个AWS实例对50,000个用户进行测试，每月费用约为3,000美元。我们如何省钱？一种方法是使用临时AWS实例-Spot实例。它们的主要好处是，我们不必使实例一直保持运行状态，而是仅将它们启动进行测试，并且成本更低。但是，一个重要的细节是，有人可以在测试时出价超出您的报价。幸运的是，这从未发生过，而且由于他们，我们已经节省了至少60％的成本。

负载生成集群 (Load-generating cluster)

We use the default JMeter cluster. It works perfectly; it does not have to be modified in any way. It has several launch options. We use the simplest option, where one master launches N number of instances, and there can be hundreds of them.

我们使用默认的JMeter集群。它运作完美；不必进行任何修改。它有几个启动选项。我们使用最简单的选项，其中一个主服务器启动N个实例，并且可以有数百个实例。

The master runs a test scenario on the JMeter servers, keeps communicating with them, collects general statistics from all instances in real-time, and displays it in the console. Everything looks the same as running the test scenario on a single server, even though we see the results of running it on a hundred servers.

主机在JMeter服务器上运行测试方案，保持与它们的通信，实时从所有实例收集常规统计信息，并将其显示在控制台中。尽管我们看到在一百台服务器上运行测试方案的结果，但一切看起来都与在一台服务器上运行测试方案相同。

For a detailed analysis of the results of test scenario execution in all instances, we use Kibana. We parse the logs with Filebeat.

为了对所有情况下的测试方案执行结果进行详细分析，我们使用Kibana。我们使用Filebeat解析日志。

适用于Apache JMeter的Prometheus侦听器 (Prometheus Listener for Apache JMeter)

JMeter has a plugin for working with Prometheus, which out of the box provides all JVM and thread usage statistics inside the test. This allows you to see how often users log in, log out, and so on. The plugin can be customized to send the test scenario run data to Prometheus and display it in real-time in Grafana.

JMeter有一个用于Prometheus的插件，该插件开箱即用地提供了测试中的所有JVM和线程使用情况统计信息。这使您可以查看用户登录，注销等的频率。可以对插件进行自定义，以将测试方案运行数据发送到Prometheus并在Grafana中实时显示。

金牛座 (Taurus)

We want to solve some current problems with Taurus, but we haven’t started that yet:

我们想解决金牛座的一些当前问题，但我们还没有开始：

Configurations instead of test scenario clones. If you have worked with JMeter, you have probably faced the need to run test scenarios with different sets of source parameters, for which you had to create clones of the test scenarios. In Taurus, you can have one test scenario and manage its launch parameters using configurations;
配置而不是测试方案克隆。如果您使用过JMeter，则可能已经遇到了需要使用不同的源参数集运行测试方案的情况，为此您必须创建测试方案的克隆。在Taurus中，您可以使用一种测试方案并使用配置来管理其启动参数。
Configs for controlling JMeter servers when working with a cluster;
使用集群时用于控制JMeter服务器的配置；
An online results analyzer that allows you to collect the results separately from the JMeter threads and not to complicate the test scenario itself;
一个在线结果分析器，它使您可以从JMeter线程中单独收集结果，而不会使测试场景本身复杂化；
Easy integration with CI;
易于与CI集成；
Capability to test a distributed system.
测试分布式系统的能力。

本部分小结 (Summary of this part)

If we use custom code inside JMeter, it is better to think about its performance right away, because otherwise, we will end up testing JMeter and not our product;
如果我们在JMeter中使用自定义代码，最好立即考虑其性能，因为否则，我们将最终测试JMeter，而不是我们的产品；
The JMeter cluster is a beautiful thing: it is easy to set up, and it’s easy to add monitoring to;
JMeter集群是一件很漂亮的事情：它易于设置，并且易于添加监视。
A large cluster can be maintained on AWS Spot instances; it will be much cheaper;
可以在AWS Spot实例上维护大型集群；它会便宜得多；
Be careful with Listeners in JMeter so that the test scenario does not slow down with a large number of servers.
在JMeter中使用侦听器时要小心，以免在使用大量服务器的情况下降低测试速度。

使用基础架构测试的示例 (Examples of using infrastructure tests)

The whole story above is largely about creating a realistic case to test the limits of the service. The examples below show how you can reuse the work you have done on the load testing infrastructure to solve local problems. I will talk in detail about two out of about ten types of load tests that we conduct periodically.

上面的整个故事主要是关于创建一个实际案例来测试服务的限制。以下示例显示了如何重用在负载测试基础结构上完成的工作来解决本地问题。我将详细讨论我们定期进行的大约十种类型的负载测试中的两种。

数据库测试 (Database testing)

What can we load test in the database? Big queries are an unlikely target because we can test them in a single-threaded mode if we just look at the query plans.

我们可以在数据库中加载什么测试？大查询是不太可能的目标，因为如果我们仅查看查询计划，就可以在单线程模式下对其进行测试。

More interesting is the situation where we run the test and see the load on the disk. The graph shows an increase in iowait.

更有趣的是，我们运行测试并查看磁盘上的负载。该图显示了iowait的增加。

Next, we see that this affects users.

接下来，我们看到这会影响用户。

Then we understand the reason: VACUUM did not run and did not remove the garbage data from the database. If you’re not familiar with PostgreSQL, VACUUM is similar to Garbage Collector in Java.

然后我们了解原因：VACUUM没有运行，也没有从数据库中删除垃圾数据。如果您不熟悉PostgreSQL，VACUUM类似于Java中的Garbage Collector。

The next thing we see is that CHECKPOINT started to trigger out of schedule. For us, this is a signal that the PostgreSQL configs do not adequately match the intensity of the database usage.

我们接下来看到的是CHECKPOINT开始触发超出计划的时间。对我们来说，这表明PostgreSQL配置与数据库使用强度不完全匹配。

Our task is to adjust the database configuration so that such situations do not happen again. PostgreSQL, for instance, has many settings. For fine-tuning, you need work in short iterations: change the settings, launch the server, evaluate, repeat. For that, of course, you need to provide a good load on the database, and this requires extensive infrastructure tests.

我们的任务是调整数据库配置，以免再次发生这种情况。例如，PostgreSQL有许多设置。为了进行微调，您需要进行短暂的迭代：更改设置，启动服务器，评估，重复。为此，当然，您需要在数据库上提供良好的负载，这需要大量的基础架构测试。

One thing is that for the test to carry on normally without any unnecessary crashes, it must take a long time. In our case, the test takes about three hours, which no longer looks like a short iteration.

一件事是，要使测试正常进行而没有任何不必要的崩溃，它必须花费很长时间。在我们的例子中，测试大约需要三个小时，这看起来不再像是短暂的迭代。

We look for a solution. We find a tool for PostgreSQL called pgreplay. It can reproduce — using multiple threads — exactly what is written in log files, exactly the way it happened at the time of writing. How can we use it effectively? We make a dump of the database, then log everything that happens to the database after the dump, and then we can deploy the dump and replay everything that happened to the database using multiple threads.

我们正在寻找解决方案。我们为PostgreSQL找到了一个名为pgreplay的工具。它可以使用多个线程来复制日志文件中所写的内容，以及在编写时发生的方式。我们如何有效地使用它？我们对数据库进行转储，然后记录转储后数据库中发生的所有事情，然后我们可以部署转储并使用多个线程重播数据库中发生的所有事情。

Where to write logs? A popular solution for logging is to collect logs in the production environment, as this gives the most realistic and reproducible test case. But there are some problems with that:

在哪里写日志？一种流行的日志记录解决方案是在生产环境中收集日志，因为这提供了最现实和可重复的测试用例。但这有一些问题：

You have to use production data for testing, which is not always possible;
您必须使用生产数据进行测试，但这并不总是可能的。
This process uses the syslog feature, which is expensive;
此过程使用syslog功能，这很昂贵；
Disk usage is increased.
磁盘使用率增加。

Our approach to large-scale testing helps us here. We make a database dump on the test environment, run a sizeable realistic test, and log everything that happens during the execution of that test. Then we use our tool called Marucy to run the database test:

我们的大规模测试方法在这里为我们提供了帮助。我们在测试环境上进行数据库转储，运行大型现实测试，并记录执行该测试期间发生的所有事情。然后，我们使用名为Marucy的工具来运行数据库测试：

An AWS instance is created;
创建一个AWS实例；
The dump we need is deployed;
我们需要的转储已部署；
pgreplay is launched to replay the logs that we need;
启动pgreplay以重播我们需要的日志；
We use our Prometheus/Grafana monitoring to evaluate the results. There are also dashboard examples in the repository.
我们使用Prometheus / Grafana监视来评估结果。存储库中还有仪表板示例。

When launching Marucy, we can pass a few parameters that can change, for example, the intensity of the test.

启动Marucy时，我们可以传递一些可以更改的参数，例如，测试强度。

In the end, we use our realistic test scenario to create a database test, and then run this test without using a large cluster. It is important to note that to test any SQL database, the test case must be uneven, otherwise, the database will behave differently than it will in the production environment.

最后，我们使用实际的测试场景来创建数据库测试，然后在不使用大型群集的情况下运行该测试。重要的是要注意，要测试任何SQL数据库，测试用例必须是不均匀的，否则，数据库的行为将与生产环境中的行为不同。

降解监测 (Degradation monitoring)

For degradation tests, we use our realistic test scenario. The idea is that we need to ensure that the service has not become slower after another release. If our developers change something in the code that leads to increased response times, we can compare the new values with the reference values and signal if there is an error in the build. For the reference values, we use the current values that suit us.

对于降级测试，我们使用实际的测试方案。我们的想法是，我们需要确保该服务在另一个版本之后不会降低速度。如果我们的开发人员更改了代码，从而导致响应时间增加，我们可以将新值与参考值进行比较，并在构建中出现错误时发出信号。对于参考值，我们使用适合我们的当前值。

Controlling the response times is useful, but we went further. We wanted to check that the response times during real users’ activity have not increased after the release. We thought that we could probably test something manually during the load testing, but that would only be dozens of cases. It’s more efficient to run the existing functional tests and check a thousand cases at the same time.

控制响应时间很有用，但我们走得更远。我们希望检查发布后实际用户活动期间的响应时间是否没有增加。我们认为我们可以在负载测试期间手动测试某些东西，但是那只会是几十种情况。运行现有的功能测试并同时检查一千个案例，效率更高。

How did we set it up? There is a master that gets deployed to the test bench after the build. Then, the functional tests are automatically run in parallel with the load tests. After that, we get a report in Allure on how the functional tests performed under load.

我们是如何设置的？在构建之后，会有一个主服务器部署到测试台。然后，功能测试会与负载测试同时自动运行。之后，我们在《魅力》中获得了一份报告，介绍了功能测试如何在负载下执行。

In this report, for example, we see that the comparison with the reference value has failed.

例如，在此报告中，我们发现与参考值的比较失败。

We can also measure browser performance with functional tests. Or, a functional test will simply fail due to an increase in operation execution time under load, because a timeout on the client-side will be triggered.

我们还可以通过功能测试来衡量浏览器的性能。否则，功能测试将仅由于负载下操作执行时间的增加而失败，因为将触发客户端超时。

本部分小结 (Summary of this part)

A realistic test allows you to test the database cheaply and easily configure it;
现实的测试可以让您廉价地测试数据库，并轻松地对其进行配置。
Functional testing under load is possible.
可以在负载下进行功能测试。

P.S.: This article was first published on Medium.

PS：本文最初发表于Medium 。