Code Smell Benchmark

关于Code Smell的Benchmark几乎没有,计算recall、precision的论文里面都是1.与开发者熟悉 2.先设置低阈值用工具检测,再人工验证,但是不公开...

Code Smells and Refactoring: A Tertiary Systematic Review of Challenges and Observations (JSS,2020) 论文里面提到了缺乏benchmark的挑战。

We do not have until now a benchmark of projects [S22, S39, S40]. Some studies [S13, S18, S39, S40] point a lack of benchmark definitions for smells validated by experts. It happens with refactoring, too. The large set of tools and systems used in the experimental settings suggest the lack of well-designed benchmarks should be better addressed. The benchmarks could be constructed, having the same characteristics as the most used systems.


但是这篇文章里面提到了一个在线的benchmark,然而这里面的dataset的名字都是candidate_(code smell),有待进一步验证:

PALOMBA F, BAVOTA G, PENTA M Di等. On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation[J]. Empirical Software Engineering, 2018, 23(3): 1188–1221. DOI:10.1007/s10664-017-9535-z. 也在ICSE发表

However, this is not a threat for our study, because the manual validation of the instances detected by the tool aims at discarding the false positives, while keeping the true positive smell instances. A detailed overview of the results obtained by the tools on Apache Cassandra is available in our online appendix (Palomba et al. 2017) 
Palomba F, Bavota G, Oliveto R, Fasano F, Di Penta M, De Lucia A (2017) Bad code smells study - online appendix. https://dibt.unimol.it/fpalomba/reports/badSmell-analysis/index.html


下面这篇文章用的就是上面的数据集

PECORELLI F, PALOMBA F, DI NUCCI D等. Comparing heuristic and machine learning approaches for metric-based code smell detection[J]. IEEE International Conference on Program Comprehension, 2019, 2019-May: 93–104. DOI:10.1109/ICPC.2019.00023.

额外:这篇文章结论是the performance of machine learning techniques are not as good as the one of heuristic approaches


还有自己构造benchmark的方法,下面一个是把认为高质量的开源应用中的一些函数m作移动,从而认为移动后的结果存在smell。另一个是先用多种已有工具检测,然后利用分层随机抽样技术

H. Liu, Z. Xu and Y. Zou, "Deep Learning Based Feature Envy Detection," 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2018, pp. 385-396, doi: 10.1145/3238147.3238166.

First, we download well-known and high quality open-source applications. Second, for each method m from such applications, we generate a labeled training sample as follows. (1) We test whether m could be moved to other classes with move method refactoring. The test is accomplished with the APIs provided by Eclipse JDT. (2) Suppose that method m could be moved to a set of classes noted as ptc = {tc1,tc2,...,tck }. If ptc is empty, i.e., the method could not be moved, we discard it and turn to the next method. Otherwise, we turn to the next step to generate a labeled training item. (3) We randomly (fifty-fifty chance) decide to generate a positive training item (with feature envy) or a negative item (without feature envy). (4) We generate a negative item as follows. First, we randomly select a potential target class tci from ptc. Second, we compute the distance dist(m, ec) and dist(m,tci), where ec is the enclosing class of m. Third, we create a negative item (nдItem) and add it to the training data set: ngItem = < input, output > ,input = < name(m),name(ec),name(tci), dist(m, ec),dist(m,tci) > (9) output = 0 .  (5) We generate a positive item as follows. First, we randomly select a potential target class tci from ptc. Second, we move m from its enclosing class ec to tci by Eclipse APIs. Third, we create a positive item whose input is < name(m),name(tci),name(ec), dist(m,tci),dist(m, ec) > and output is 1. Notably the distances are computed after the method is moved.

D. Di Nucci, F. Palomba, D. A. Tamburri, A. Serebrenik and A. De Lucia, "Detecting code smells using machine learning techniques: Are we there yet?," 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2018, pp. 612-621, doi: 10.1109/SANER.2018.8330266.

To establish the dependent variable for code smell prediction models, the authors applied for each code smell the set of automatic detectors shown in Table I. However, code smell detectors cannot usually achieve 100% recall, meaning that an automatic detection process might not identify actual code smell instances (i.e., false negatives) even in the case that multiple detectors are combined. To cope with false positives and to increase their confidence in validity of the dependent variable, the authors applied a stratified random sampling of the classes/methods of the considered systems: this sampling produced 1,986 instances (826 smelly elements and 1,160 nonsmelly ones), which were manually validated by the authors in order to verify the results of the detectors. As a final step, the sampled dataset was normalized for size: the authors randomly removed smelly and non-smelly elements building four disjoint datasets, i.e., one for each code smell type, composed of 140 smelly instances and 280 nonsmelly ones (for a total of 420 elements). These four datasets represented the training set for the ML techniques above.

接下来找的方向是:基于机器学习的code smell检测,里面验证肯定需要benchmark,看看有没有公开的~

2021年9月14日11:35:22

持续更新...


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值