analyzer_Crash Analyzer的故事:Unity中自动执行崩溃处理的追求

analyzer

My name is Igor and I am a Toolsmith at Unity, which means I am part of the team that build tools to increase productivity of Devs and QA in Unity with the aim to improve the overall quality of the product and the experience of our Users.

我的名字叫Igor,我是Unity的工具匠,这意味着我属于团队的成员,这些工具旨在提高Unity的开发人员和质量检查人员的工作效率,目的是提高产品的整体质量和用户体验。

以前在那里... (What was there before…)

Let’s start with a little bit of history of handling bugs in Unity. There is a tool installed with the Editor called the Bug Reporter, which could be launched either manually or automatically in case of a crash (see more at Reporting-a-bug). After a user submits a bug report, QA has to try to reproduce the issue and turn this report (which is initially called an incident) into a bug, which will be passed to development teams for triage and/or fix or other solution. Good reports should contain a descriptive title, steps to reproduce and have a project (ideally – a small one focused on the problem) attached to it, so it is easy for us to verify and reproduce the issue. That’s what we at Unity always hope to get in the report (more at Attaching-your-project-to-a-bug-report).

让我们从处理Unity中的错误的一些历史开始。 编辑器中安装了一个称为Bug Reporter的工具,可以在崩溃时手动或自动启动该工具(请参阅 Reporting-a-bug的更多信息 )。 用户提交错误报告后,质量检查人员必须尝试重现问题并将此报告(最初称为事件)转换为错误,然后将其传递给开发团队进行分类和/或修复或其他解决方案。 好的报告应包含描述性标题,重现步骤,并附加一个项目(理想情况下是针对问题的小项目),因此我们可以轻松地验证和重现该问题。 这就是Unity始终希望获得的报告(更多内容请参见 将项目附加到错误报告 )。

问题 (The problem)

But here comes another side of the equation. Having more than a million registered users we receive A LOT of reports of the same bug, which is good, until it becomes bad: someone has to look at every report that gets sent (around 6000 per month), verify it, reply to the user and so on. And after we verified the bug (and/or even have a fix for it ready), all the additional reports sent aren’t helping to solve the problem but taking valuable time from QA going through them. Automation to the rescue!

但是,这又是等式的另一面。 拥有超过一百万的注册用户,我们会收到很多有关同一错误的报告,这很好,直到变得不好为止:有人必须查看发送的每个报告(每月约6000个),进行验证,然后答复。用户等。 而且,在我们验证了错误(和/或准备好修复程序)之后,发送的所有其他报告都无助于解决问题,而是花费了宝贵的时间进行质量检查。 自动化救援!

解决方案 (The solution)

It turns out that crashes are the perfect example of a problem which in most cases is identified by one characteristic common to all of them: the callstack of the crash (i.e. the sequence of function calls in the program code of the Editor which eventually lead to a crash – with the name of the crashed function on top of it). Which means, unlike many other Bugs, those kinds of problems are much easier to group together by the machine without any user intervention (there are exceptions to that rule, but more about that later).

事实证明,崩溃是问题的完美示例,在大多数情况下,崩溃是所有问题共有的一个特征:崩溃的调用堆栈(即,编辑器程序代码中的函数调用序列,最终导致崩溃-上面显示了崩溃函数的名称)。 这意味着,与许多其他Bug不同,此类问题更容易由机器在无需任何用户干预的情况下归类(该规则有例外,但以后会更多)。

好处 (Benefits)

When we started this project a few years ago we had no idea how many additional insights it would give us. From historical data of all the crashes across different versions of Editor (quantity, dynamics, etc), to the ability to immediately identify if a crash happened on a user’s machine already has a fix in a Unity version.

几年前,当我们开始这个项目时,我们不知道它将为我们带来多少其他见解。 从横跨不同版本的Editor的所有崩溃的历史数据(数量,动态等),到能够立即识别用户计算机上发生的崩溃是否已经在Unity版本中得到修复的功能。

We built a tool which analyzes all the reports sent by users, parses all logs from the Editor attached to them to find the callstack of a crash and then maps identical or similar crashes together (figure 1). That gave tremendous value and increase in productivity for both developers fixing the issue (who now have all the similar reports at their fingertips and can quickly look for more information or other repro project) and testers (who can immediately see if the reported issue falls into a certain category and if there might already be a solution for it or at least a verified bug with a public Issue Tracker item for it where users can keep track of the issue, therefore providing the user with help in a timely fashion). Now release managers can also assess the stability and production readiness of the builds way before they make their way into alpha or beta testing, let alone stable releases (Unity Roadmap) and look for possible regressions and the User Pain caused.

我们构建了一个工具,该工具可以分析用户发送的所有报告,分析来自附加到他们的编辑器的所有日志,以查找崩溃的调用堆栈,然后将相同或相似的崩溃映射在一起(图1)。 这对于解决该问题的开发人员(现在他们可以触及所有相似的报告,并且可以快速查找更多信息或其他repro项目)和测试人员(可以立即查看所报告的问题是否属于其中)提供了巨大的价值,并提高了生产率。某个类别,如果可能已经有解决方案,或者至少有一个带有公共 问题追踪器 项目 的经过验证的错误, 则用户可以跟踪该问题,从而及时为用户提供帮助。 现在,发行经理还可以在进行alpha或beta测试之前评估构建方式的稳定性和生产就绪性,更不用说稳定发行版( Unity Roadmap )并寻找可能的回归和 用户痛苦

Figure 1. Bucket page with similar crashes and reports statistics

图1.具有相似崩溃和报告统计信息的存储桶页面

在崩溃分析器中处理重复的报告 (Handling duplicate reports in Crash Analyzer)

If for some type of crash we were able to turn one of the reports into a bug (i.e. have steps to reproduce / project provided by the user), we might want to close all other similar reports as duplicates (while providing users a link to track the progress on the bug fixing with Issue Tracker). What we can do is to mark the report as the repro for the crash and then resolve all the others as duplicates (figure 2).

如果对于某些类型的崩溃,我们能够将其中一个报告转换为错误(即,用户具有重现/提供项目的步骤),我们可能希望关闭所有其他类似报告作为重复报告(同时向用户提供指向以下链接的链接)使用 Issue Tracker跟踪 错误修复的进度 )。 我们可以做的是将报告标记为崩溃的副本,然后将所有其他报告都视为重复(图2)。

Figure 2. Resolve & close duplicates if we already have a repro to work with

图2.解决并关闭重复项(如果我们已经有一个repro可以使用)

For everyone’s convenience Slack integrations were also added, so now if you want to receive notifications of new reported crashes (along with info showing if it is known or not, Unity version it was reported against, and so on) all you need is subscribe to a few channels (Figure 3).

为了给大家带来方便,还添加了Slack集成,因此,现在,如果您想接收有关新报告的崩溃的通知(以及有关是否知道该崩溃的信息,所报告的Unity版本的信息,等等),您只需订阅几个通道(图3)。

Figure 3. Slack notifications when new crashes are reported

图3.报告新的崩溃时的松弛通知

已知问题或要解决的问题 (Known issues or things to address)

We are not out of the woods yet! We keep working to improve our algorithms (some of the called functions on the stack are meaningless and should be filtered out, some of the platforms provide us with better callstack collecting mechanisms than others, etc). Some of the crashes are fully identified by their callstack and as a result, could be processed completely automatically. Sometimes callstacks must be 100% identical to be the same bug. Sometimes it is enough to be ‘similar’ (for example, have the same top frame – crashed function name, but varies down the stack a little bit). But in some cases even identical callstacks could mean different root causes. This often happens with the external calls into 3rd party libraries or drivers, where exact place of the crash itself is not enough and different parameters of the call or varying setup could result in different problems. For those crashes full automation is still not possible yet and it requires human investigation to tell the difference. The goal is to at least semi-automate cases like that, which means someone has to take a look, resolve possible issues manually before we can advance to automatic handling.  

我们还没有走出困境! 我们一直在努力改进算法(堆栈上的某些被调用函数毫无意义,应予以滤除,某些平台为我们提供了比其他平台更好的调用栈收集机制,等等)。 有些崩溃由其调用栈完全识别,因此可以完全自动进行处理。 有时,调用栈必须100%相同才能成为同一个bug。 有时,只要“相似”就足够了(例如,具有相同的顶部框架-崩溃的函数名称,但在堆栈上有些许变化)。 但是在某些情况下,即使相同的调用栈也可能意味着不同的根本原因。 这通常发生在调用第三方库或驱动程序的外部调用中,其中崩溃本身的确切位置不足,并且调用的不同参数或不同的设置可能导致不同的问题。 对于那些碰撞,尚无法实现完全自动化,并且需要人工调查以区别。 我们的目标是至少将这种情况半自动化,这意味着有人必须先看一下,然后手动解决可能的问题,然后我们才能进行自动处理。

还有更多… (More to come…)

The plan is to further integrate with existing tools like Issue Tracker and Bug Reporter, so instead of collecting a user’s crash report, storing it at our servers, analyzing and then providing references to an existing bug and/or a solution for it, we’ll be able to exchange data right away between Bug Reporter and Crash Analyzer’s backend to prevent sending reports for already fixed issues and provide users with immediate feedback / solutions instead of submitting reports and waiting for a response from QA.

我们的计划是与现有的工具(例如Issue Tracker和Bug Reporter)进一步集成,因此,与其收集用户的崩溃报告,将其存储在我们的服务器中,分析然后提供对现有bug和/或解决方案的引用,我们不如能够立即在Bug Reporter和Crash Analyzer的后端之间交换数据,以防止针对已解决的问题发送报告,并为用户提供即时反馈/解决方案,而不必提交报告并等待质量检查的响应。

Stay tuned for more…

敬请期待更多…

翻译自: https://blogs.unity3d.com/2017/03/07/the-story-of-crash-analyzer-the-quest-of-automating-crash-handling-at-unity/

analyzer

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值