sigma-delta_Delta调试-简化失败的测试用例

最新推荐文章于 2023-12-28 15:38:46 发布

weixin_26752759

最新推荐文章于 2023-12-28 15:38:46 发布

阅读量775

点赞数

文章标签： python debug

原文链接：https://medium.com/@d_dchris/delta-debugging-simplifying-failing-test-cases-e5a1d7613b4b

版权

sigma-delta

简化失败的测试用例 (Simplify the Failing Test Case)

In software engineering, we write bugs and we want to catch them. Particularly in production-level solutions, our inputs can be fairly large and once it induces a software failure, we are lost and scratching our heads trying to identify sections of the input that trigger the failure. Once the input segments are isolated, it would be much easier to trace the program points where the bugs are hiding. In essence, we want a systematic approach to simplify and reduce a large failing input into the minimum form and still be able to reproduce the same error.

在软件工程中，我们编写错误并希望捕获它们。特别是在生产级解决方案中，我们的输入可能会很大，一旦导致软件故障，我们就会迷失方向，摸索着试图找出触发故障的输入部分。一旦隔离了输入段，就可以更容易地跟踪隐藏了错误的程序点。本质上，我们需要一种系统化的方法来简化和减少大量的失败输入，并将其减少为最小形式，并且仍然能够再现相同的错误 。

Suppose we have a large HTML file (896 lines) causing a failure in the web browser, the graph below (red cross indicates test fail and green tick indicates test pass) illustrates the steps we partition the failing HTML file and gradually shrink the segment size which eventually isolate the bug-inducing segment of “<SELECT>”. Instead of using the original failing HTML file, we could reproduce the same error using a new HTML file with just keyword “SELECT”.

假设我们有一个较大HTML文件(896行)，导致Web浏览器失败，下图(红叉表示测试失败，绿色对勾表示测试通过)说明了对失败HTML文件进行分区并逐步缩小段大小的步骤最终隔离了引起错误的段“ <SELECT> ”。代替使用原始的失败HTML文件，我们可以使用仅带有关键字“ SELECT ”的新HTML文件来重现相同的错误。

Image for post — Illustration of Simplifying Failing Inputs

If you pay close attention, the above process of partitioning and searching is like a binary search — for each iteration, we recursively reduce the search space by half on the segment that induces the error. Does the binary search fit this use case and call it a day? Not necessarily, the last few steps are unlikely achieved by binary search and the algo could probably stop at the input size of 40 characters as a further partitioning results in the test passes for both subsets (substring <SELECT NAME= “priori (20 characters) will not induce the error). Though the 40-character reduced input is a satisfactory solution, it is not the minimum test case that induces the same error. So how minimum is minimum?

如果您密切注意，上面的分区和搜索过程就像一个二进制搜索-对于每次迭代，我们将在导致错误的段上递归地将搜索空间减少一半。二进制搜索是否适合此用例并称之为一天？不一定，二进制搜索不太可能实现最后几步，算法可能会以40个字符的输入大小停止，因为两个子集的测试通过进一步划分结果(子字符串<SELECT NAME =“ priori (20个字符)不会导致错误)。尽管减少40个字符的输入是令人满意的解决方案，但并不是最小的测试用例会导致相同的错误。 那么最小值如何最小？

定义极简 (Defining Minimality)

In the above browser failure example, the 8-character “<SELECT>” is the global minimum as it is the smallest input that induces the failure. Ideally, we want our algorithm to reach such a global minimum but in practice, it incurs too much computation cost and requires an exponential number of passes. We plan to strike a balance between the computational cost and minimality (how close the size of the generated test case relates to the global minimum).

在上面的浏览器故障示例中，8个字符“ <SELECT> ”是全局最小值，因为它是引发故障的最小输入。理想情况下，我们希望我们的算法达到这样的全局最小值，但在实践中，它将招致过多的计算成本，并且需要成倍的通过次数。我们计划在计算成本和最小值之间取得平衡(生成的测试用例的大小与全局最小值之间的接近程度)。

The above slide illustrates the relationship between different levels of minimality. From 1-minimal to global minimum, it represents a shrinking input space, tighter conditions, and more computation required. The 1-minimal is the weakest form of all the minimality. In practice, this is the form that our algorithm plans to identify systematically and automatically.

上面的幻灯片说明了不同最小级别之间的关系。从1最小值到全局最小值，它表示缩小的输入空间，更严格的条件以及所需的更多计算。 1个最小值是所有最小值中的最弱形式。实际上，这是我们的算法计划自动识别的形式。

Delta调试 (Delta Debugging)

The earlier paragraph mentioned the situation where binary search could not further divide the input space and induce the failure (both subsets pass the test). To tackle this and increase the probability of eventually getting a smaller & failing subset, we could either 1) test larger subsets 2) test smaller subsets. The delta debugging algorithm alternatively applies these two strategies to reach a 1-minimal solution.

前面的段落提到了二进制搜索无法进一步划分输入空间并导致失败(两个子集都通过测试)的情况。为了解决此问题并增加最终获得较小且失败的子集的可能性，我们可以1)测试较大的子集2)测试较小的子集。增量调试算法也可以应用这两种策略来达到1最小值的解决方案。

The failing input C_f 
input = C_f
set n (partition_size)while (input.size >=n) {
     partition input into n equal subsets: input = Δ_1 ∪ Δ_2 ∪ Δ_3... ∪ Δ_n (delta list)
     get n corresponding complements of ∇_i: ∇_1, ∇_2, ... ∇_n 
where ∇_i = input - Δ_i (nabla list)
     
     // reduce to the failing subset in next iteration
     if (any Δ_i induces failure) {
         input = Δ_i
     } 
     // reduce to the failing complement in next iteration
     elif (any ∇_i induces failure) {
         input = ∇_i
         n = n - 1
     } 
     // increase granularity, search finer space
     else {
         n = n * 2
     }
}

This algorithm is called “Minimizing Delta Debugging Algorithm” from this paper. It systematically executes the above two strategies by introducing the concept of the delta and nabla list.

这种算法是从这个被称为“ 最小化三角洲调试算法 ” 论文。通过引入delta和nabla list的概念，它系统地执行上述两种策略。

When we find a failing subset Δ_i in the delta list, we immediately know we could reduce the search space and start new partitioning from Δ_i;
当我们在增量列表中找到失败的子集Δ_i时，我们立即知道我们可以减少搜索空间并从Δ_i开始新的划分；
When no failing subset is identified in the delta list but some failing subset appears in the nabla list (the complement list), we reduce the search space and start new partitioning from ∇_i (recall ∇_i is a larger subset as it is the complement of Δ_i)
当增量列表中未标识出任何失败的子集而nabla列表(补语列表)中出现了某个失败的子集时，我们会缩小搜索空间并从∇_i开始新分区(回想∇_i是一个较大的子集，因为它是补数Δ_i)
When there is no failing case in either delta or nabla list, for the current failing input, we increase the granularity and test smaller subsets (compared to the size of Δ_i in the previous iteration)
当增量或nabla列表中都没有失败的情况时，对于当前失败的输入，我们增加了粒度并测试了较小的子集(与上一次迭代中Δ_i的大小相比)

闭幕 (Closing)

The “Minimizing Delta Debugging Algorithm” is an interesting and smart technique to reduce a complex failing input into a smaller & more manageable size while preserving the same error-inducing characteristics. The algorithm enhances the basic binary search or divide-and-conquer concept by introducing complementing nabla and delta lists. I see its application could be extended beyond testing but also to any feedback-enabled search application. If we can receive feedback on whether the current partition contains the info we are looking for, we could search smaller or bigger subsets systematically using this delta debugging algorithm.

“ 最小化Delta调试算法 ”是一种有趣且智能的技术，可以将复杂的故障输入减少为更小且更易于管理的大小，同时保留相同的引起错误的特性。通过引入互补的nabla和delta列表，该算法增强了基本的二进制搜索或分治法的概念。我看到它的应用程序可以扩展到测试之外，还可以扩展到任何启用了反馈的搜索应用程序。如果我们可以收到有关当前分区是否包含我们要查找的信息的反馈，则可以使用此增量调试算法系统地搜索较小或较大的子集。

Translate the algorithm pseudocode into a Java snippet:

将算法伪代码转换为Java代码段：

翻译自: https://medium.com/@d_dchris/delta-debugging-simplifying-failing-test-cases-e5a1d7613b4b

sigma-delta

weixin_26752759

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
sigma-delta_Delta调试-简化失败的测试用例

sigma-delta Delta调试-简化失败的测试用例 (Delta Debugging — Simplifying Failing Test Cases)Neringa Hünnefeld on NeringaHünnefeld摄于UnsplashUnsplash 简化失败的测试用例 (Simplify the Failing Test Case)In software enginee...
复制链接

扫一扫