新的软件测试数据集 [来自南京大学]

本文探讨了软件测试竞赛中观察到的现象及教训,分析了分支覆盖率与变异分数之间的显著正相关性,表明在变异测试不可行时,分支覆盖率可作为替代方案。此外,研究了不同测试顺序的有效性,发现先易后难的策略优于先难后易,以及基于UML的前向测试顺序优于后向。竞赛数据集公开,为后续研究提供了宝贵资源。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

前言

今天读文章读到一篇IEEE Computer期刊的文章:Software-Testing Contests: Observations and Lessons Learned,里面还附上了一个software testing database,我觉得很有意义,所以在此记录下。

文章大致内容

文章链接:
https://ieeexplore.ieee.org/abstract/document/8848154/authors#authors

引用:

@article{wang_software-testing_2019,
title = {Software-{Testing} {Contests}: {Observations} and {Lessons} {Learned}},
volume = {52},
number = {10},
journal = {Computer},
author = {Wang, Xingya and Sun, Weisong and Hu, Linghuan and Zhao, Yuan and Wong, W Eric and Chen, Zhenyu},
year = {2019},
pages = {61–69}
}

具体的我就不多说了,讲下框架:

背景: 作者之前组织了很多软件测试的竞赛,所以在此分享下经验,并且还依托竞赛的数据做了一个实证。

RQs:

  • Does branch coverage have a strong correlation with mutation score in unit testing?
  • Does test order at class level have an impact on the effectiveness of unit testing?

Evaluation metrics:

  • branch coverage
  • mutation score
  • combination of both scores.

conclusion:

Mutation testing has been proposed to measure the fault detection strength of test cases based on the mutation score. However, mutation testing might not be feasible due to its high execution cost. Our analysis using 846 manually created test suites shows that there is a significant and moderate to strongly positive correlation between branch coverage and mutation score. This suggests that branch coverage can still be used as an alternative when mutation testing is not feasible.
In addition to the correlation analysis between branch coverage and mutation score, we also analyzed the testing effectiveness of different test orders and their popularity. Three interesting observations were drawn from the analysis results. First, the answering the easiest question first strategy performed better than the answering the most difficult question first strategy. Second, forward UML-based test orders performed better than backward UML-based test orders. Third, the test order “Other” achieved the highest average score of 37.94, and it is noticeably higher than the second highest average score of 27.20. This observation raises the question of whether there were specific test orders we did not identify. In our experimental design, we analyzed eight test orders based on feedback from practitioners in the industry, researchers in academia, and some contestants. We are confident that we did not miss any specific test order in our analysis. If this is true, this could suggest that flexible test order could achieve good effectiveness. Nevertheless, we will conduct more experiments to further investigate this observation.

软件测试竞赛的数据集( Software Testing Contest Data Repository )

Our contests are more valuable because we have created a data repository, the Software Testing Contest Data Repository (STCDR) at http://www.iselab.cn/contest/data/. It includes data collected from STC 2016 and STC 2017 that can be accessed by the public.

网址:http://www.iselab.cn/contest/data/

小结

这个database我觉得还是有意义的,虽然作者已经挖掘了一部分规律出来,但是未来还有更多操作空间。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值