graphql测试工具_如何大规模运行GraphQL指令驱动的容量测试

最新推荐文章于 2024-08-13 08:27:00 发布

weixin_26752759

最新推荐文章于 2024-08-13 08:27:00 发布

阅读量507

点赞数

文章标签： python

原文链接：https://medium.com/expedia-group-tech/how-to-run-graphql-directive-driven-capacity-tests-at-scale-57f7fc566e63

版权

graphql测试工具

Expedia Group Technology —软件 (EXPEDIA GROUP TECHNOLOGY — SOFTWARE)

Every year, vacationers flock to the Vrbo™ site (part of Expedia Group™) around the holidays. It’s a time when families come together to figure out their upcoming family vacations. It’s a busy time for us and we spend a lot of energy ensuring we can handle all of the traffic spikes that come with the season. This year, however, we had an even bigger challenge: a bowl game sponsorship and many high profile television advertisements that would drive large spikes to our system — particularly to our mobile apps.

每年，度假者都会在假期前后涌向Vrbo™网站 (Expedia Group™的一部分)。这是一个家庭聚在一起计算即将到来的家庭假期的时候。对我们来说这是一个忙碌的时间，我们花费大量精力确保我们能够应对本季节带来的所有流量高峰。但是，今年我们面临着更大的挑战：赞助碗赛和许多备受瞩目的电视广告，这将使我们的系统(尤其是我们的移动应用程序)大幅度增长。

At Vrbo, we ensure our system can handle these sorts of spikes by running capacity tests. Prior to adopting GraphQL and Apollo Server, our capacity tests relied on replaying unauthenticated/anonymized HTTP GET traffic. This is a safe way to ensure that we are not inadvertently replaying traffic that we should not. HTTP POST requests can change state, while GET requests do not. This style of test can provide a lot of confidence that our applications and routes with the highest call rates and scaling needs are ready for action.

在Vrbo，我们通过运行容量测试来确保我们的系统可以处理此类峰值。在采用GraphQL和Apollo Server之前，我们的容量测试依赖于重播未经身份验证/匿名的HTTP GET通信。这是确保我们不会无意间重播不应该流量的安全方法。 HTTP POST请求可以更改状态，而GET请求则不能。这种测试风格可以使我们放心，我们具有最高呼叫速率和扩展需求的应用程序和路由已准备好采取行动。

But our GraphQL traffic is always sent as a POST request! These requests were not getting replayed during our load tests and we were not stressing the systems we expected to stress. Replaying all POST requests was not an option. We needed a smarter, more targeted approach that provided the coverage we needed to know our systems could handle the strain, without changing the state of our system or writing invalid content to data stores

但是我们的GraphQL通信总是作为POST请求发送的！在我们的负载测试期间，这些请求没有得到重放，我们也没有强调我们期望强调的系统。重播所有POST请求不是一种选择。我们需要一种更智能，更有针对性的方法，该方法提供所需的覆盖范围，使我们知道系统可以应对压力，而无需更改系统状态或将无效内容写入数据存储

Diagram showing the edge logging HTTP GET requests to S3 for consumption by the load testing tool, but not POST requests — `POST` requests were not captured for load tests `POST`请求以进行负载测试

At first we built custom log scrapers for high-load queries. We extracted request data plumbed through application logs and processed it with bash scripts to fit the load framework input format. This allowed us to use real request parameters for all of our mobile searches in the peak capacity tests, but we used a predefined query format, and it only worked for search queries. This script successfully exercised the most critical service dependencies for our mobile app during traffic spikes by replaying searches with the query arguments mined from application logs. We mirrored this approach for search and details pages.

最初，我们为高负载查询构建了自定义日志抓取工具。我们提取了通过应用程序日志读取的请求数据，并使用bash脚本对其进行了处理，以适应加载框架输入格式。这使我们能够在峰值容量测试中对所有移动搜索使用真实的请求参数，但是我们使用了预定义的查询格式，并且仅适用于搜索查询。通过使用从应用程序日志中提取的查询参数重播搜索，此脚本成功地在流量高峰期间行使了对我们移动应用程序最关键的服务依赖性。我们在搜索和详细信息页面上都采用了这种方法。

This approach gave us some coverage, but it didn’t truly represent all production traffic, and the scope problem continued to grow as our website was shifting most of its traffic over to GraphQL. Creating a new custom test every time we made a new GraphQL request was not an option.

这种方法为我们提供了一定的覆盖范围，但是并不能真正代表所有生产流量，并且随着我们的网站将其大部分流量转移到GraphQL上，范围问题继续增长。每次我们提出新的GraphQL请求时，都无法创建新的自定义测试。

Evolution of the prior diagram, showing POST requests stored in S3 via a log data scraper run on application logs. — Ad hoc log scraping is a labor intensive and error-prone manual process

Further complicating the situation, a wide variety of GraphQL traffic can come from a single endpoint — especially from our mobile apps — which means simply replaying traffic from certain routes would not completely solve the problem for us either.

使情况进一步复杂化的是，单个端点(尤其是来自我们的移动应用程序)可能会产生多种GraphQL流量，这意味着仅重播某些路由的流量也无法完全解决我们的问题。

At this point, we stepped back and defined requirements for what we really needed.

在这一点上，我们退后一步，定义了我们真正需要的需求。

GraphQL容量测试要求 (GraphQL capacity test requirements)

1.使用单个系统范围的开关启用应用程序级别日志记录 (1. Application level logging enabled with a single systemwide switch)

Logic can’t live at the edge as our GET logging does, because we would need to read and evaluate every post body, which would be expensive and add latency.
逻辑不能像我们的GET日志记录那样处于边缘，因为我们将需要读取和评估每个帖子主体，这将是昂贵的并增加延迟。
Deciding what to log is not trivial and requires knowledge of the API; only the GraphQL Server can really know which API calls are safe to log.
决定记录什么并不容易，并且需要了解API；只有GraphQL Server才能真正知道哪些API调用可以安全记录。
Our performance engineering team manages and runs our capacity tests. We need a way for them to turn on and off log collection for all applications at once.
我们的绩效工程团队管理和运行我们的容量测试。我们需要一种让他们一次打开和关闭所有应用程序的日志收集的方法。
Since we cannot rely on our edge to log the POST traffic for replay, we need to do it at the application level, but the cost/complexity of handling this one application at a time is prohibitive.
由于我们不能依靠自己的优势来记录POST流量以进行重播，因此我们需要在应用程序级别进行此操作，但是一次处理一个应用程序的成本/复杂性令人望而却步。

2.所有匿名重播流量，无需任何身份验证或个人识别信息 (2. All replay traffic anonymized without any authentication or personally identifying information)

We didn’t want to expose a security hole by logging data we shouldn’t — like valid authentication headers, passwords etc. And since this behavior was being controlled, this might invalidate some queries. Even if they were not changing state, they relied on valid authentication headers, or other secure information we did not want to store for replay.

我们不想通过不应记录的数据来暴露安全漏洞，例如有效的身份验证标头，密码等。而且由于此行为受到控制，因此这可能会使某些查询无效。即使它们没有更改状态，它们也依赖于有效的身份验证标头或我们不想存储用于重放的其他安全信息。

3.选择重播GraphQL操作-不选择退出 (3. Opt into replay of GraphQL operations — not Opt Out)

Requests need to be opted into the peak capacity test (PCT). We don’t want to rerun some requests during PCT, like booking requests and inquiry submissions, because they may change the state of the system. And no matter where the request is being executed, we can’t accidentally log something we’re not supposed to.

需要将请求加入峰值容量测试(PCT)。我们不想在PCT期间重新运行某些请求，例如预订请求和查询提交，因为它们可能会更改系统状态。而且，无论在何处执行请求，我们都不会意外记录不应该记录的内容。

Explicitly selecting which GraphQL operations to replay is the safest approach to deciding what to log. A wide variety of requests can go through a single GraphQL endpoint. An opt-out or blocklist approach is error-prone, and can get out of date with what is actually being requested.

明确选择要重播的GraphQL操作是确定要记录什么的最安全方法。单个GraphQL端点可以处理各种各样的请求。选择退出或阻止列表的方法容易出错，并且可能会与实际要求的内容过时。

When a client makes a GraphQL request, we examine the request and mark it as safe to log or not based on whether the operation has been labeled as replayable.

当客户端发出GraphQL请求时，我们会根据该操作是否被标记为可重播来检查该请求并将其标记为安全记录。

4. GraphQL模式是可以重播的唯一事实来源 (4. The GraphQL schema is the single source of truth for what can be replayed)

Making the schema the source of truth for what can be replayed has the following benefits:

使模式成为可重播的真相来源具有以下好处：

Makes the decision discoverable
使决策变得可发现
Decision owned by the schema owner/domain expert
架构所有者/领域专家拥有的决策
Universally decided for all consumers of the schema
普遍决定所有架构使用者
Traceable to a single code change/ticket, making security reviews easier to manage
可追溯到单个代码更改/票证，使安全审核更易于管理

Due to how we compose our GraphQL APIs, schemas are reused across many applications. This makes accurately representing the load of our entire system during a PCT difficult without a centralized decision for what to log. Relying on each client to configure their load testing queries put the decision in the wrong hands and would likely lead to drift from reality.

由于我们是如何构成GraphQL API的，架构可在许多应用程序中重用。如果没有集中决定要记录什么内容，这将很难在PCT期间准确表示整个系统的负载。依赖于每个客户端来配置其负载测试查询会把决策交到错误的人手中，这很可能会导致脱离现实。

最终架构 (Final architecture)

An evolution of the first diagram, showing plugins on each app communicating to the A/B platform and to Kafka, thence to S3 — A hapi plugin gates logic for which operations can be replayed

实作 (Implementation)

After breaking down the problem, we identified four discrete pieces of functionality we needed to implement:

解决问题后，我们确定了需要实现的四个离散功能：

Managing which routes to record and providing hooks to inject business logic for how and what to log
管理记录的路由并提供挂钩以注入业务逻辑以记录方式和内容
Configuring and controlling PCT logging across many applications
在许多应用程序中配置和控制PCT日志记录
Deciding which GraphQL requests to log
确定要记录的GraphQL请求
Collecting the logs and running the load tests
收集日志并运行负载测试

管理记录的路由并提供挂钩 (Managing which routes to record and providing the hooks)

We needed a framework to hook our GraphQL-specific logic into our existing webapps. To do that, we built a generic internal plugin, hapi-request-logging, that provides those hooks.

我们需要一个框架来将特定于GraphQL的逻辑连接到我们现有的Web应用程序中。为此，我们构建了一个通用的内部插件hapi-request-logging ，它提供了这些挂钩。

Because hapi allows you to apply plugins to routes, we effectively solved the “which routes to log” problem by encapsulating the logic in a plugin — we just configure which routes to record by including the plugin in those route configurations. To make things simple for our webapp developers, we include this plugin whenever we configure the GraphQL server plugin for a route.

因为hapi允许您将插件应用于路由，所以通过将逻辑封装在插件中，我们有效地解决了“要记录哪些路由”问题-我们只需通过将插件包含在那些路由配置中来配置要记录的路由。为了使我们的webapp开发人员更简单，只要为路由配置GraphQL服务器插件，便会包含此插件。

hapi-request-logging provides three hooks for injecting business logic. These are lists of functions that are applied during the hapi request lifecycle:

hapi-request-logging提供了三个用于注入业务逻辑的钩子。以下是在hapi请求生命周期中应用的功能列表：

transformers: how to map the request object to your log format
变形金刚 ：如何将请求对象映射到您的日志格式
loggers: how and where to log — if needed you could log to many places.
记录器 ：如何记录以及在何处记录-如果需要，您可以记录到许多地方。
skippers: determine what requests to log — again, we can apply multiple skipper functions to refine what is being logged with different logical checks. In our implementation, we have two skippers: an on/off switch for log collection, and another for injecting our GraphQL-specific business logic.
skippers ：确定要记录的请求-同样，我们可以应用多个skipper函数来完善使用不同逻辑检查记录的内容。在我们的实现中，我们有两个跳过器：一个用于日志收集的开/关开关，另一个用于注入我们的GraphQL特定的业务逻辑。

This provides all the generic tools we need to log GraphQL traffic to a location that the performance engineers can collect inputs from. Now we just need to provide the business logic to inject into the transformers, loggers, and skippers hooks.

这提供了将GraphQL流量记录到性能工程师可以从中收集输入信息的位置所需的所有通用工具。现在，我们只需要提供业务逻辑就可以插入到Transformers ， loggers和skippers hook中。

在许多应用程序中配置和控制容量测试日志记录 (Configuring and controlling capacity test logging across many applications)

The second piece is to provide the logic for where the requests are being logged and when logging is enabled. This logic is handled by another internal plugin, hapi-pct-logging, which handles all the implementation specific logic about where and how to log, as well as providing the universal switch to turn on log capture.

第二部分是提供有关在何处记录请求以及何时启用记录的逻辑。此逻辑由另一个内部插件hapi-pct-logging处理，该插件处理有关在何处以及如何记录日志的所有特定于实现的逻辑，并提供通用开关来打开日志捕获。

In other words, hapi-pct-logging provides a transformer, skipper and a logger to hapi-request-logging.

换句话说， hapi-pct-logging提供了一个变压器 ，船长和一个记录器 ，以hapi-request-logging 。

The transformer function it provides handles the mapping from the hapi request object to an Avro definition that defines the Kafka topic schema. This is where we filter out headers and other information in the request that we do not want to include in our load test.

它提供的转换器功能处理从hapi请求对象到定义Kafka主题架构的Avro定义的映射。在这里，我们可以过滤掉不希望包含在负载测试中的请求中的标头和其他信息。

The skipper function acts as our universal switch to turn on logging. It checks to see if the the global feature gate that enables logging is turned on.

船长功能是我们打开日志记录的通用开关。它检查是否启用了启用日志记录的全局功能门。

The logger function handles logging the transformed request to Kafka.

记录器功能处理将转换后的请求记录到Kafka。

Now we had a generic framework for logging POST requests to a well defined location and could also be turned on and off across applications by a single switch. That left us with building out the support for determining which GraphQL requests we could log.

现在，我们有了一个通用框架，用于将POST请求记录到一个定义明确的位置，并且还可以通过单个开关在应用程序之间打开和关闭。这样就为我们提供了确定可以记录哪些GraphQL请求的支持。

确定要记录的GraphQL请求 (Deciding which GraphQL requests to log)

With these two plugins, we had all we needed to log POST requests to a Kafka topic whenever the performance engineering team enabled the feature gate throttle in our A/B testing tool. But we still needed to filter out traffic we shouldn’t be replaying. This was a big problem for our mobile GraphQL API, where all our traffic goes through a single endpoint.

有了这两个插件，每当性能工程团队在A / B测试工具中启用功能闸限速器时，我们就将POST请求记录到Kafka主题中。但是我们仍然需要过滤掉我们不应该重放的流量。对于我们的移动GraphQL API来说，这是一个大问题，因为我们的所有流量都流经单个端点。

We decouple schema ownership from application deployment ownership, and many different applications can host the same schema. It’s not feasible to ask every service owner to determine which GraphQL requests they feel are safe to log. The schema owners are the proper owners for that decision.

我们将架构所有权与应用程序部署所有权脱钩，许多不同的应用程序可以托管同一架构。让每个服务所有者确定他们认为可以安全记录的GraphQL请求是不可行的。模式所有者是该决策的适当所有者。

Additionally, the schema really is the interface of a GraphQL server, so by annotating the Schema with a directive that determines what is replayable, we can make that information more discoverable.

此外，该架构实际上是GraphQL服务器的接口，因此，通过使用确定可重播内容的指令注释该架构，我们可以使该信息更易于发现。

We decided to define a GraphQL directive called @replayable. The replayable directive can be added to queries and mutations. We wrap the field resolve function with code that decorates the context. At runtime, when that query is executed, we allow requests with the @replayable directive.

我们决定定义一个称为@replayable的GraphQL指令。可重播指令可以添加到查询和变异中。我们用装饰上下文的代码包装字段resolve函数。在运行时，执行该查询时，我们允许使用@replayable指令进行请求。

When we register the GraphQL server on a route in our hapi server, we register the hapi-request-logging plugin and add another skipper function that evaluates the state of the decorated request context.

当我们在hapi服务器中的路由上注册GraphQL服务器时，我们注册了hapi-request-logging插件，并添加了另一个跳过器函数，用于评估修饰后的请求上下文的状态。

We also ensure that the directive is defined for all instances of our internal graphql-server library (which wraps apollo-server).

我们还确保为内部graphql-server库的所有实例(包装apollo-server )定义了指令。

All that’s left is to decorate all the operations we wanted to expose to our peak capacity test. After adding the directive, the schema definition looks like this:

剩下的就是装饰所有我们想对峰值容量测试公开的操作。添加指令后，架构定义如下所示：

使用我们的负载测试框架收集和运行请求 (Collecting and running the requests with our load testing framework)

A key win for this change was to reduce the amount of overhead per PCT. At the time we finished this work, we had three ad hoc tests that generated load, and were likely going to need more and more if we wanted to continue to improve our API coverage.

此项更改的一个主要胜利是减少每PCT的间接费用。在完成这项工作时，我们进行了三个临时测试，这些测试产生了负载，并且如果我们想继续提高API覆盖率，可能会越来越需要它。

Now, our reliability engineers have a single, simple process:

现在，我们的可靠性工程师只需一个简单的过程即可：

Enable a feature gate throttle in our A/B testing framework
在我们的A / B测试框架中启用功能闸调节
Wait
等待
Run a quick job to extract the data from S3
运行快速作业以从S3提取数据

And as we scale out our GraphQL schema to different applications and teams, we can still get coverage for PCT tests for those calls. The performance engineering team’s process never changes — they just get that traffic for free.

并且，当我们将GraphQL模式扩展到不同的应用程序和团队时，我们仍然可以覆盖这些调用的PCT测试。性能工程团队的流程永不改变-他们只是免费获得该流量。

案例研究：在柑橘碗期间准备移动设备以达到峰值容量 (Case study: Prepping mobile for peak capacity during the Citrus Bowl)

With all of the infrastructure in place, we were able to set up a load test prior to peak traffic that pushed all of our systems to the max. We effectively tested mobile traffic at spikes of up to five times our daily max traffic with real representative queries.

有了所有的基础架构，我们就能够在峰值流量之前进行负载测试，从而将所有系统推向最大。我们使用实际的代表性查询有效地测试了移动流量，其峰值峰值高达每天最大流量的五倍。

Graphs showing stable response times while load is ramping up — During the capacity test, our response times stay relatively stable (left) while traffic ramps up (right)

That load test did strain our infrastructure and made it clear that we needed to scale it up before January 1.

该负载测试确实使我们的基础架构紧张，并明确表明我们需要在1月1日之前进行扩展。

When that day arrived, we saw big spikes, but at significantly lower load than what we load tested at — only around three times our peak traffic — and we sailed through them without much difficulty. You can see the system have minor hiccups initially as we scaled up to support the new traffic rate, but we only saw small increases in our p95 and p99 response times.

当天到来时，我们看到了峰值，但是负载比我们测试时的负载低得多-仅是高峰流量的三倍-并且我们在其中艰难地航行。您可以看到，当我们扩大规模以支持新的流量速率时，系统最初会有一些小问题，但是p95和p99响应时间只出现了小幅增长。