新的软件测试数据集 [来自南京大学]

最新推荐文章于 2024-05-08 04:12:28 发布

宇内虹游

最新推荐文章于 2024-05-08 04:12:28 发布

阅读量1.5k

点赞数

分类专栏： PHD candidate 1

本文链接：https://blog.csdn.net/weixin_39278265/article/details/102943857

版权

PHD candidate 1 专栏收录该内容

55 篇文章

订阅专栏

本文探讨了软件测试竞赛中观察到的现象及教训，分析了分支覆盖率与变异分数之间的显著正相关性，表明在变异测试不可行时，分支覆盖率可作为替代方案。此外，研究了不同测试顺序的有效性，发现先易后难的策略优于先难后易，以及基于UML的前向测试顺序优于后向。竞赛数据集公开，为后续研究提供了宝贵资源。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

前言

今天读文章读到一篇IEEE Computer期刊的文章：Software-Testing Contests: Observations and Lessons Learned，里面还附上了一个software testing database，我觉得很有意义，所以在此记录下。

文章大致内容

文章链接：
https://ieeexplore.ieee.org/abstract/document/8848154/authors#authors

引用：

@article{wang_software-testing_2019,
title = {Software-{Testing} {Contests}: {Observations} and {Lessons} {Learned}},
volume = {52},
number = {10},
journal = {Computer},
author = {Wang, Xingya and Sun, Weisong and Hu, Linghuan and Zhao, Yuan and Wong, W Eric and Chen, Zhenyu},
year = {2019},
pages = {61–69}
}

具体的我就不多说了，讲下框架：

背景： 作者之前组织了很多软件测试的竞赛，所以在此分享下经验，并且还依托竞赛的数据做了一个实证。

RQs：

Does branch coverage have a strong correlation with mutation score in unit testing?
Does test order at class level have an impact on the effectiveness of unit testing?

Evaluation metrics：

branch coverage
mutation score
combination of both scores.

conclusion:

Mutation testing has been proposed to measure the fault detection strength of test cases based on the mutation score. However, mutation testing might not be feasible due to its high execution cost. Our analysis using 846 manually created test suites shows that there is a significant and moderate to strongly positive correlation between branch coverage and mutation score. This suggests that branch coverage can still be used as an alternative when mutation testing is not feasible.
In addition to the correlation analysis between branch coverage and mutation score, we also analyzed the testing effectiveness of different test orders and their popularity. Three interesting observations were drawn from the analysis results. First, the answering the easiest question first strategy performed better than the answering the most difficult question first strategy. Second, forward UML-based test orders performed better than backward UML-based test orders. Third, the test order “Other” achieved the highest average score of 37.94, and it is noticeably higher than the second highest average score of 27.20. This observation raises the question of whether there were specific test orders we did not identify. In our experimental design, we analyzed eight test orders based on feedback from practitioners in the industry, researchers in academia, and some contestants. We are confident that we did not miss any specific test order in our analysis. If this is true, this could suggest that flexible test order could achieve good effectiveness. Nevertheless, we will conduct more experiments to further investigate this observation.

软件测试竞赛的数据集（ Software Testing Contest Data Repository ）

Our contests are more valuable because we have created a data repository, the Software Testing Contest Data Repository (STCDR) at http://www.iselab.cn/contest/data/. It includes data collected from STC 2016 and STC 2017 that can be accessed by the public.

网址：http://www.iselab.cn/contest/data/