大规模的c++软件设计_大规模了解软件质量

最新推荐文章于 2024-05-15 08:24:03 发布

郝ren

最新推荐文章于 2024-05-15 08:24:03 发布

阅读量329

点赞数

原文链接：https://tech.instacart.com/understanding-software-quality-at-scale-79ebb25cb3ac

版权

大规模的c++软件设计

A common challenge when building good software is: how do you measure your product quality at scale? What feedback would you expect to get when you’re building good software?

构建优质软件时的一个常见挑战是：如何大规模衡量产品质量？在构建良好的软件时，您希望获得什么反馈？

With hundreds of Instacart Enterprise deployments per day impacting dozens of retailers, it is impossible to verify everything by hand. To measure and monitor Enterprise software quality, the Enterprise Test Automation team has built Mango, which is an API test automation solution embedded across all of our development, UAT, and production environments.

每天有数百个Instacart Enterprise部署影响数十个零售商，因此无法手工验证所有内容。为了衡量和监控企业软件质量，企业测试自动化团队构建了Mango ，这是一个嵌入在我们所有开发，UAT和生产环境中的API测试自动化解决方案。

In this post, we’ll dive into the metrics collection and reporting workflows we built.

在本文中，我们将深入研究我们构建的指标收集和报告工作流程。

开发流程 (Development Workflow)

An important part of measuring product quality is to embed running the automation suite throughout the entire Software Development Life Cycle (SDLC). Test runs are triggered each time engineers commit product code, but they are also triggered by every deployment including the final deployment to production. At Instacart, we also run tests on an hourly basis since a lot of test scenarios are driven by the product catalog which changes over time. These frequent test runs help us capture, collect, and report on software quality metrics.

衡量产品质量的重要部分是在整个软件开发生命周期(SDLC)中嵌入运行自动化套件的时间。每次工程师提交产品代码时都会触发测试运行，但是每次部署(包括最终的生产部署)也会触发测试运行。在Instacart，我们还每小时进行一次测试，因为很多测试方案都是由随时间变化的产品目录驱动的。这些频繁的测试运行有助于我们捕获，收集和报告软件质量指标。

Image for post — GitHub code changes and deployments trigger the test automation workflow

By incorporating test runs into all of these workflows, we can also capture the frequency of test failures based on the environment. This allows us to set goals around test failures and bugs: we expect the number of bugs and test failures to decrease from the highest count, to a very low number (or none!) by the time our code reaches production canary services. If for some reason this metric is reversed then it’s likely that not enough bugs are being caught early during Pull Requests, which tells us we should be focusing on improving the local development workflow.

通过将测试运行合并到所有这些工作流程中，我们还可以根据环境捕获测试失败的频率。这使我们能够围绕测试失败和错误设置目标：我们期望在代码到达生产金丝雀服务时，错误和测试失败的数量将从最高数量减少到非常低的数量(或根本没有！)。如果由于某种原因导致该指标被颠倒，那么很可能在Pull Requests的早期就没有捕获到足够多的错误，这表明我们应该专注于改进本地开发工作流程。

我们收集的指标 (Metrics We Collect)

We aim to instrument every part of the test process so that we can publish these metrics and build team and organization goals around them. Below is a list of a few key metrics that we capture

我们旨在检测测试过程的每个部分，以便我们可以发布这些指标并围绕它们建立团队和组织目标。以下是我们捕获的一些关键指标的列表

Test Status: How many test cases passed, failed, xfailed, xpassed, or were skipped. This is our most fundamental metric — in particular how many tests have failed or xfailed. When tests are xfailed, there is either a bug causing the failure or the test code itself is outdated. This is a great way to measure tech debt.
测试状态 ：通过，失败，xfailed，xpassed或跳过了多少测试用例。这是我们最基本的指标，尤其是有多少测试失败或xfailed。如果测试失败，则可能是导致失败的错误，或者测试代码本身已过时。这是衡量技术债务的好方法。
Test Duration: We aggregate by environment, team, and feature. The goal is to continually reduce the average duration per test case to maintain a highly efficient test suite. Using this metric, we are also able to identify which tests are the slowest, which gives us ideas on what code we can focus on to speed up our systems.
测试时间 ：我们根据环境，团队和功能进行汇总。目标是不断减少每个测试用例的平均持续时间，以保持高效的测试套件。使用该指标，我们还能够确定哪些测试最慢，从而使我们对可以专注于哪些代码以加快系统运行速度有了想法。
Bugs in JIRA: How many bugs are reported in JIRA. When we investigate test failures, we create bugs in JIRA and attach them to the automation test failure. This is also a good indicator of technical debt and overall quality, and we want to keep this metric low.
JIRA中的错误 ： JIRA中报告了多少个错误。在调查测试失败时，我们会在JIRA中创建错误并将其附加到自动化测试失败中。这也是技术债务和整体质量的良好指标，我们希望将此指标保持较低水平。
Test Case Counts: We aggregate this number by team, feature, and contributors. As mentioned earlier, this allows us to give recognition to top teams and people that contribute tests, while also allowing us to identify and encourage change on teams with no tests.
测试用例计数 ：我们按团队，功能和贡献者汇总此数字。如前所述，这使我们能够认可顶尖的团队和提供测试的人员，同时也使我们能够识别和鼓励没有测试的团队进行变革。
Code Coverage: Lines of code coverage, aggregated by platform (API, UI, etc.). By deploying an instrumented version of the product codebase, we can collect and track code coverage metrics. This allows us to see which product areas have no tests at all, and also which code is completely unreachable! It’s really common across many companies that code changes get merged which are never actually executed in the deployed application.
代码覆盖率 ：代码覆盖率行，由平台(API，UI等)汇总。通过部署工具版本的产品代码库，我们可以收集和跟踪代码覆盖率指标。这使我们可以查看哪些产品区域根本没有测试，以及哪些代码完全无法访问！在许多公司中，代码更改被合并到一起实际上是很普遍的，而这些更改从未在已部署的应用程序中真正执行过。
Static Analysis Reports: Automatically generated metrics, which include cyclomatic complexity, average lines per class, average lines per function, etc. We do not strictly enforce limits here — we use this to provide visibility to our engineering teams, which helps to proactively reinforce good development habits.
静态分析报告 ：自动生成的度量标准，包括圈复杂度，每个类的平均行数，每个函数的平均行数等。我们在这里不严格执行限制-我们使用此限制为我们的工程团队提供可视性，这有助于主动加强质量发展习惯。

品质仪表板 (Quality Dashboards)

All these metrics serve one primary purpose: dashboards! Dashboards allow us to add alerts, notify the right stakeholders, and, most importantly, create measurable goals and hold ourselves accountable to maintaining a high-quality product. Collecting and publishing test automation metrics allows our Enterprise Engineering teams to build comprehensive dashboards that give us a good snapshot of our product quality.

所有这些指标都具有一个主要目的：仪表板！仪表板使我们可以添加警报，通知合适的利益相关者，最重要的是，可以制定可衡量的目标，并使自己对保持高质量产品负责。收集和发布测试自动化指标可以使我们的企业工程团队构建全面的仪表板，从而为我们提供产品质量的良好快照。

Since metrics are published to Datadog at test case granularity, we can drill down from the high-level overview shown above into test results by feature and even by test case. These metrics are great because in situations where a test case fails in just one environment the root cause is typically a specific integration issue, while a test case failure across the board is a good indicator that bad code made it through to a production deployment.

由于指标是按测试用例粒度发布到Datadog的，因此我们可以从上面显示的高级概述中按功能甚至按用例深入挖掘测试结果。这些度量标准非常好，因为在一个测试案例仅在一个环境中失败的情况下，根本原因通常是特定的集成问题，而全面的测试案例失败则很好地表明了不良代码将其贯穿到生产部署中。

自动化测试状态和生命周期 (Automation Test Status and Lifecycle)

Instacart’s Mango framework leverages pytest’s test case states and attaches a meaning for each state. The diagram below illustrates each of these states, and also the actions required to move between states:

Instacart的Mango框架利用pytest的测试用例状态，并为每个状态附加一个含义。下图说明了每种状态，以及在状态之间移动所需的动作：

We find that these states reflect a very typical process in our SDLC. Capturing these states also provides us a way to measure how many tests need to be worked on (those in FAIL status), and also a way to measure technical debt, (a count of tests in XFAIL status).

我们发现这些状态反映了我们SDLC中非常典型的过程。捕获这些状态还为我们提供了一种方法，该方法可以测量需要进行的测试数量(处于FAIL状态的测试)，还可以度量技术债务(处于XFAIL状态的测试计数)。

处理自动化故障 (Acting on Automation Failures)

When building our quality dashboards, the most important thing we consider is “how is the information being displayed actionable?” If a dashboard exists but no actions can be taken to investigate failures and improve the metric, then it’s not really helpful to investigate the issue. With Instacart Enterprise, Mango attaches a request ID header for every API request sent, which is then published to Datadog for any test case that failed:

在构建我们的质量仪表盘时，我们考虑的最重要的是“ 显示的信息如何可操作？ ”如果存在仪表板，但无法采取任何措施来调查故障并改善指标，那么调查问题并没有真正的帮助。对于Instacart Enterprise，Mango为发送的每个API请求附加一个请求ID标头，然后针对任何失败的测试用例将其发布到Datadog：

To further improve the feedback cycle, test owners are notified via Slack each time a test run fails. The appropriate on-call team member is pinged based on the test failure, and we strive to provide enough reporting information so that a failure can be troubleshoot-ed without the need to re-run the tests again locally:

为了进一步改善反馈周期，每次测试运行失败时，都会通过Slack通知测试所有者。根据测试失败对适当的待命团队成员进行ping操作，我们努力提供足够的报告信息，以便可以对失败进行故障排除，而无需在本地再次运行测试：

产品代码中的测试端点 (Test Endpoints in Product Code)

We quickly realized that black box testing our API was an unmanageable approach. For example, what if we need to automate a test case that verifies if a $2 for 1 coupon offer is correctly applied to your cart? In an extreme example, a retailer may have 10 stores and only 1 product discount, so how could we quickly find such a product?

我们很快意识到，对我们的API进行黑盒测试是一种难以管理的方法。例如，如果我们需要自动执行一个测试用例，以验证2美元1优惠券优惠是否正确应用到您的购物车，该怎么办？在一个极端的例子中，零售商可能有10家商店，而只有1种产品折扣，那么我们如何Swift找到这种产品呢？

A common solution is to run a query against a Product Catalog DB or ElasticSearch instance, but this doesn’t scale well. The biggest drawback to this approach is that test code is now tied to your infrastructure and schemas. The more systems you have tests for, the more integrations you need to manage. Maintaining sensitive credentials, database connections, etc. are dependencies too big to introduce to our test framework.

常见的解决方案是对Product Catalog DB或ElasticSearch实例运行查询，但这不能很好地扩展。这种方法的最大缺点是，测试代码现在已绑定到您的基础架构和架构。要测试的系统越多，需要管理的集成就越多。维护敏感的凭据，数据库连接等是很大的依赖关系，因此无法引入我们的测试框架。

We came up with a system which allows our tests to interact directly with our data sources using an endpoint like GET /api/fixtures/products. The test setup methods call the endpoint with a few parameters based on a particular scenario, receive products for their scenario, and carry on with the test. This is convenient because the product code already has all dependencies in place to serve such a request, and the endpoint is also reusable for UI automation tests.

我们提出了一个系统，该系统允许我们的测试使用诸如GET / api / fixtures / products的端点直接与数据源进行交互。测试设置方法根据特定场景使用一些参数调用端点，接收针对其场景的产品，然后继续进行测试。这很方便，因为产品代码已经具有满足此请求的所有依赖关系，并且端点也可用于UI自动化测试。

奖励自动化贡献者 (Rewarding Automation Contributors)

Driving adoption among product teams is a critical part of any test automation framework. It is very important to us to embed team and product ownership into automation tests and also to reward our top contributors. A little positive reinforcement goes a long way, and we want to make our engineers feel good about contributing to automation tests by rewarding them, rather than making it feeling like a chore with no positives.

推动产品团队之间的采用是任何测试自动化框架的关键部分。对我们而言，将团队和产品所有权纳入自动化测试并奖励我们的杰出贡献者非常重要。进行一些积极的改进很长的路要走，我们希望通过奖励工程师，使工程师对为自动化测试做出贡献感到满意，而不是让自己感觉像是没有积极性的琐事。

Before each merge into master, we capture the github usernames of engineers who added tests and their corresponding product teams. This allows us to give “kudos” to teams that are staying on top of automation efforts, while focusing on framework support and driving adoption for lower-contributing teams. A great way to create an environment of ongoing recognition is to show a monthly snapshot of the dashboard at an all hands meeting as well on a mounted monitor at the office.

在每次合并成master之前，我们捕获添加测试的工程师及其相应的产品团队的github用户名。这使我们能够对那些在自动化工作上保持领先的团队给予“赞誉”，同时专注于框架支持和推动较低贡献团队的采用。创建持续认可环境的一种好方法是在全体员工会议上以及办公室的已安装显示器上显示仪表板的每月快照。

As Mango matures, driving adoption across the organization is critical. Everyone is a contributor to quality and that’s the attitude and culture we have been encouraging at Instacart. As a new Software Engineer joining the company, you can expect an onboarding workshop scheduled with a Test Automation team member in which you will cover our best quality practices and how to contribute to our automation efforts. The Test Automation team has made it easy for engineers to see how they contribute to quality, and we are now driving Mango’s usage across multiple product areas within Instacart!

随着芒果的成熟，在整个组织范围内推动采用至关重要。每个人都是质量的贡献者，这就是我们在Instacart一直鼓励的态度和文化。作为加入公司的新软件工程师，您可以期望一个由Test Automation团队成员安排的入门培训班，其中将介绍我们的最佳质量实践以及如何为我们的自动化工作做出贡献。测试自动化团队使工程师可以轻松地了解他们如何对质量做出贡献，并且我们现在正在Instacart的多个产品领域中推动Mango的使用！

最后的想法 (Final Thoughts)

As Instacart continues to improve our automated test coverage, we need to remind ourselves that it’s not just about the number of test cases — we need to measure our progress and set goals against our observed metrics. A medium to large unorganized test suite adds only short-term value to the organization; it eventually gets scrapped or rewritten because no one knows how much is really being tested and the test code base (as well as the code base being tested!) is too large to easily comprehend. Automation becomes a powerful tool to drive good quality and delivers long term value only after we start capturing meaningful metrics and delivering actionable reporting from our test suites.

随着Instacart不断提高我们的自动化测试覆盖率，我们需要提醒自己，这不仅仅是测试用例的数量-我们需要衡量进度，并根据观察到的指标制定目标。中型到大型无组织的测试套件只会为组织增加短期价值；它最终会被废弃或重写，因为没人知道真正要测试多少，并且测试代码库(以及正在测试的代码库！)太大而难以理解。仅当我们开始捕获有意义的指标并从我们的测试套件中提供可行的报告之后，自动化才成为提高质量并提供长期价值的强大工具。

Want to build tools and processes like these? Our Enterprise engineering team is hiring! Visit our careers page to explore our current openings.

是否想要构建此类工具和流程？我们的企业工程团队正在招聘！访问我们的职业页面以浏览我们目前的职位空缺。

翻译自: https://tech.instacart.com/understanding-software-quality-at-scale-79ebb25cb3ac

大规模的c++软件设计

郝ren

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
大规模的c++软件设计_大规模了解软件质量

大规模的c++软件设计A common challenge when building good software is: how do you measure your product quality at scale? What feedback would you expect to get when you’re building good software? 构建优质软件时的一个常见挑战...
复制链接

扫一扫