service一览_内部一览：绿色自动化的追求

最新推荐文章于 2024-09-23 19:11:34 发布

culiao6493

最新推荐文章于 2024-09-23 19:11:34 发布

阅读量257

点赞数

文章标签：单元测试 c++ python 人工智能 java

原文链接：https://blogs.unity3d.com/2016/09/08/a-look-inside-the-quest-for-green-automation/

版权

service一览

One of the hardest things to achieve in any software development process is having an almost always green set of automated tests. Just like it is almost impossible to find and fix all bugs in a software, it is impossible to have tests always green. That doesn’t mean we shouldn’t try! In this blog post, we go over some challenges we have on a daily basis in regard to our automated test suites.

在任何软件开发过程中要实现的最困难的事情之一就是拥有几乎总是绿色的自动化测试集。就像几乎不可能找到并修复软件中的所有错误一样，使测试始终保持绿色也是不可能的。这并不意味着我们不应该尝试！在此博客文章中，我们将克服每天在自动化测试套件方面遇到的挑战。

If you are developing a simple library and you design it from scratch using a methodology like Test Driven Development, for example, you are likely to end up with something that works well and has a nice set of fast and stable unit tests to accompany it. If you are developing anything more complex than that, unit tests will not be enough and you will need to add at least an additional integration test suite. Unity is very complex due to the number of features that need to integrate with each other and the fact that we support building games and applications on more than 20 different platforms. Testing software of this complexity meant we ended up creating a lot of high level internal testing frameworks.

例如，如果您正在开发一个简单的库，并且使用诸如“测试驱动开发”之类的方法从头开始设计它，那么您最终可能会得到很好的效果，并伴随有一套快速而稳定的单元测试。如果要开发比这更复杂的东西，单元测试将还不够，因此您至少需要添加一个附加的集成测试套件。由于需要相互集成的功能数量众多，并且我们支持在20多个不同平台上构建游戏和应用程序，因此Unity非常复杂。如此复杂的测试软件意味着我们最终创建了许多高级内部测试框架。

Platforms currently supported by Unity

Unity当前支持的平台

Let’s take a look at some numbers! Our code base is 12 years old which means there is lots of legacy code intermixed with lots of new code. Here are some detailed stats collected using cloc:

让我们看一些数字！我们的代码库已有12年的历史，这意味着有很多旧代码与许多新代码混合在一起。以下是使用 cloc 收集的一些详细统计信息：

Engine (Runtimes, Modules, Shaders):

引擎(运行时，模块，着色器)：

663 kloc code + 55 kloc comments in 3.8k files
3.8k文件中的663 kloc代码+ 55 kloc注释
87% C++, 4% C#, 5% bindings, 2% shaders
87％C ++，4％C＃，5％绑定，2％着色器

Editor (Editor, Extensions, some Tools):

编辑器(编辑器，扩展程序，一些工具)：

749 kloc code + 57 kloc comments in 4.6k files
4.6k文件中的749 kloc代码+ 57 kloc注释
51% C++, 44% C#, 4% bindings
51％C ++，44％C＃，4％绑定

Platform Dependent:

平台相关：

464 kloc code + 52 kloc comments in 2.8k files
2.8 k文件中的464 kloc代码+ 52 kloc注释
59% C++, 37% C#, 2% bindings
59％C ++，37％C＃，2％绑定

Tests (does not include all tests; some are inside editor/runtime, especially c++ unit tests):

测试(不包括所有测试；其中一些在编辑器/运行时内部，尤其是c ++单元测试)：

301 kloc code + 21 kloc comments in 4.6k files
301 kloc代码+ 4.6 k文件中的21 kloc注释
1% C++ :), 87% C#, 8% shaders
1％C ++ :)，87％C＃，8％着色器

Total:

总：

2.2 million lines of code + 185 thousand lines of comments
220万行代码+ 185,000行注释

These numbers don’t include any of the external libraries we integrate into Unity. The tests are split into low-level C++ unit tests and high-level C# tests. The C# tests can be of many different kinds based on what they test: runtime, integration, asset import, graphics, performance, etc. In total we have about 60000 automated tests that are being executed tens of millions of times every month, both locally through manual runs and on our build farm.

这些数字不包括我们集成到Unity中的任何外部库。测试分为底层C ++单元测试和高层C＃测试。 C＃测试根据其测试内容可以分为多种类型：运行时，集成，资产导入，图形，性能等。总共，我们大约有60000项自动测试，每月执行数千万次，均在本地进行通过手动运行以及在我们的构建场中进行。

The fact that we rely so much on high level automated tests means that we have to deal with test failures and instabilities on a constant basis. In order to keep these to a minimum we started doing a number of things:

我们非常依赖高级自动化测试，这意味着我们必须不断处理测试失败和不稳定性。为了使这些最小化，我们开始做一些事情：

Make sure the head revision of trunk (main development branch) is always green
确保主干(主开发分支)的主修订版始终为绿色
Implement monitoring and reporting of tests
实施测试的监控和报告
Optimize the time spent executing tests
优化执行测试所花费的时间

We use a development method that relies heavily on having multiple code branches that are kept in sync with and eventually merged back to trunk when ready. We want to always be able to have a build ready for release from the head revision of trunk, which means we want to make absolutely sure that it is always green. Everyone branching from trunk also wants to start their work on code that passes all automation.

我们使用一种开发方法，该方法严重依赖于多个代码分支保持同步，并在准备好时最终合并回主干。我们希望始终能够准备好要从主干修订版发布的内部版本，这意味着我们要绝对确保其始终为绿色。从主干分支出来的每个人都还想开始通过所有自动化的代码的工作。

The current way we are keeping trunk always green is by using a staging branch. Every day, multiple people submit code that should be merged to trunk. Requests like these get bundled together, merged onto the staging branch and all test automation is executed. If anything fails, we have a tool that reruns the failed tests on the same revision again to verify if it is just an instability or an actual failure. If it is an instability, a notification is posted to an internal chat, where we always have one or more developers investigating any issue that gets posted there. If it is an actual failure, we run a bisection process to quickly figure out which one of the code merge requests introduced it. The person responsible for that gets notified and the code is removed from the staging branch. If everything passes as expected, the staging branch gets merged to trunk. We call this the Trunk Queue Verification process.

当前使主干始终保持绿色的方法是使用暂存分支。每天都有多个人提交应合并到主干的代码。诸如此类的请求被捆绑在一起，合并到暂存分支上，并执行所有测试自动化。如果有任何失败，我们提供了一个工具，可以在相同的修订版本上再次运行失败的测试，以验证这只是不稳定还是实际的失败。如果不稳定，则会在内部聊天室中发布通知，在该聊天室中，我们总是有一个或多个开发人员调查在那里发布的任何问题。如果它是一个实际的失败，我们运行一个二分过程Swift弄清楚哪些代码合并请求之一介绍了吧。负责的人将得到通知，并且代码将从登台分支中删除。如果一切都按预期通过，则暂存分支将合并到中继。我们将此称为“中继队列验证”过程。

An automatic notification about an instability that happened in the staging branch

有关登台分支中发生的不稳定的自动通知

This process does help keeping the main development branch always green, but it is far from ideal. It is costly to maintain because running all our test suites takes hours and finding the source of some failures require human intervention in a lot of cases. The ideal scenario would be for us to run tests on all branches after every new set of changes is pushed achieving something close to continuous integration. Right now, we are running tests in the most naive way possible, which usually means that for most new batch of changes, we run all the tests. We are taking the first steps towards changing this and improving everyone’s iteration time on test automation runs here at Unity by introducing a smart test selection service.

此过程的确有助于使主要开发分支始终保持绿色，但这远非理想。维护成本很高，因为运行我们所有的测试套件都需要花费数小时，而在许多情况下，发现某些故障的根源需要人工干预。理想的情况是，在推动每组新的更改以实现接近持续集成的目标之后，我们在所有分支上运行测试。现在，我们正在以最幼稚的方式运行测试，这通常意味着对于大多数新一批更改，我们都将运行所有测试。我们正在迈出第一步，通过引入一种智能的测试选择服务来改变这一情况，并改善每个人在Unity上运行的自动化测试的迭代时间。

We have previously blogged about our Unified Test Runner which also stores lots of information about every test run in a database. We now have tens of millions of test data points where we can see when a test was executed, by whom, on which machine, if it failed or passed, how long it took to execute, etc. We are starting to leverage all this data and build a rule based system for selecting which tests should run based on which code was changed on a specific branch. Here are a few examples:

我们之前曾在博客中发布过有关统一测试运行器的信息，该程序还存储了有关数据库中每次测试运行的大量信息。现在，我们有数以千万计的测试数据点，我们可以在其中查看测试的执行时间，执行者，在哪台计算机上运行，是否失败或通过，执行所需的时间等。我们开始利用所有这些数据并建立一个基于规则的系统，用于根据特定分支上更改的代码来选择应运行的测试。这里有一些例子：

Executing all integration tests takes about 90 minutes. We can see from historical data that more than half of these tests have always passed in the last 100 runs. We introduce a rule that will skip always green tests for 9 consecutive runs and only run them every 10th run. That saves us 60 minutes for each of those 9 runs.
执行所有集成测试大约需要90分钟。从历史数据中我们可以看到，在过去的100次运行中，有一半以上的测试始终通过了测试。我们引入了一条规则，该规则将在连续9次运行时始终跳过绿色测试，而仅在第10次运行时运行它们。这9次跑步每次可为我们节省60分钟。
The code for the AI feature is nicely isolated from the rest of the code base into it’s own module. Someone makes changes only to the AI code. There is a rule that will determine that only AI tests should be executed.
AI功能的代码与其余代码库很好地隔离到了自己的模块中。有人只能对AI代码进行更改。有一条规则将确定仅应执行AI测试。
A branch only introduces a few new tests. There is a rule that determines that running only those new tests (eventually the full test suites of which they are a part of) should be enough.
分支仅引入一些新测试。有一条规则可以确定，仅运行那些新测试(最终是它们所包含的完整测试套件)就足够了。
If we are running tests on the trunk staging branch, we always run all of them to make sure trunk is always green
如果我们在主干登台分支上运行测试，那么我们将始终运行所有测试以确保主干始终为绿色

Using this rule-based system will save everyone a lot of time, but it will not remove instabilities. Instabilities make working with tests unreliable and slow, which is why we need to fix the source of the instability as fast as possible. Instabilities can be caused by tests (in which case the test is disabled and a bug with the highest priority is opened for someone to fix it) or by infrastructure (mobile devices in the build farm get disconnected or crash/freeze and need to be restarted, etc). For infrastructure issues all we can do is have good management and monitoring tools.

使用此基于规则的系统将为每个人节省大量时间，但不会消除不稳定因素。不稳定性使测试工作变得不可靠且缓慢，这就是为什么我们需要尽快修复不稳定性的原因。不稳定性可能是由于测试(在这种情况下，测试被禁用，并且优先级最高的错误会由他人修复)引起的，或者由基础结构(构建服务器场中的移动设备断开连接或崩溃/冻结，需要重新启动)引起的等)。对于基础设施问题，我们所能做的就是拥有良好的管理和监视工具。

We are not the only ones struggling to keep test automation green and stable. Google has written about this quite extensively in their Flaky Tests blogpost and they also offer great advice on what one can do to avoid this in their Hackable Projects blogpost. Facebook also uses a system of bots to make sure automation runs fast and stable. You can see more in one of their presentations from GTAC and another from the F8 2015 conference.

我们不是唯一努力保持测试自动化绿色稳定的人。 Google在其 Flaky Tests 博客文章中已经对此进行了广泛的报道，并且他们在 Hackable Projects 博客文章中也提供了关于如何避免这种情况的重要建议。 Facebook还使用机器人系统来确保自动化运行快速且稳定。您可以在 GTAC 和 F8 2015大会的另一篇演讲中看到更多内容。