2021-11-08:软件工程文章阅读：Text Filtering and Ranking for Security Bug Report Prediction

最新推荐文章于 2024-07-13 17:17:38 发布

qq_43771887

最新推荐文章于 2024-07-13 17:17:38 发布

阅读量192

点赞数

分类专栏：缺陷定位文章标签： r语言开发语言

本文链接：https://blog.csdn.net/qq_43771887/article/details/121206424

版权

缺陷定位专栏收录该内容

5 篇文章 0 订阅

订阅专栏

Text Filtering and Ranking for Security Bug Report Prediction

文章目录

- Text Filtering and Ranking for Security Bug Report Prediction
前言
一、基本问题？
二、文章内容
- 1.作用和贡献
- 2.整体框架
三、实验设置

前言

Text Filtering and Ranking for Security Bug Report Prediction. IEEE Trans. Software Eng. 45(6): 615-631 (2019)

安全漏洞允许用户不适当地访问系统，从而对软件造成损失，通常需要报告者不要公开任何可疑的安全漏洞，因此需要对bug报告进行分类

一、基本问题？

there is one underlying issue not fully explored in these models, which we call security cross words. Security cross words denote the use of the same security related keywords in both security and non-security bug reports.

大意：在这些模型中，存在一个潜在的问题没有解决，即安全交叉（安全与非安全报告中存在相同的安全相关关键字）

二、文章内容

1.作用和贡献

An approach to automatically identify security related keywords and security cross words from security bug reports.
An automatic filtering and ranking method to build better text-based prediction models for security bug reports by removing NSBRs with security cross words from the prediction model. Better prediction models reduce the mislabelling of security bug reports.
A tractable method to use both bug reports from within a single project and bug reports from other projects to build text-based prediction models for security bug reports.
A ranking capability used to generate a useful ranked list of bug reports where most of the actual security bug reports are closer to the top of the list.

大意：
1)一种从安全漏洞报告中自动识别安全相关关键字和安全交叉字的方法。
2)一种自动过滤和排序方法，通过从预测模型中删除具有安全交叉的词，构建更好的基于文本的预测模型。
3)可以使用单个项目中的bug报告和其他项目中的bug报告来为安全bug报告构建基于文本的预测模型。
4)一种用于生成一个有用的错误报告排名列表的排名功能，其中大多数实际的安全错误报告更接近列表的顶部。

2.整体框架

在这里插入图片描述
该框架分为三个部分

Identifying Security Related Keywords
在SBR中计算每个term的 tf-idf值，将其排名最高的tf-idf值被认为是安全相关的词，若在nsbr中发现，即安全交叉字。再使用与安全相关的关键字构建术语-文档矩阵。
Filtering Bug Reports
从NSBR中删除具有安全相关关键字，过滤器是基于特征集中术语的评分，使用这些分数，计算错误报告的总分。
计算特征集中术语的评分：

第五行的支撑函数：
支撑函数：考虑的是当讨论x变化的时候，极值如何变化

farsecsq, applying the Jalali et al. [31] support function to the frequency of words found in SBRs;
farsectwo, the Graham [30] version of multiplying the frequency by two and;
farsec, which offers no support.
由于第6行中的方程对于低频是一个较差的排序启发式。所以对第五行有一个变化。
出现在sbr中的单词没有出现在nsbr中的情况下，概率0.99被分配为它们的分数。相反，当出现在nsbr中出现的单词没有出现在sbr中时，分配的概率为0.01。

bug报告的整体分数：
在这里插入图片描述
NSBRs are selected using the threshold score of >= 0:75，阈值的选择是基于分数居于中值的报告经验来说，即0.4-0.6很少趋于SBR。

Ranking Bug Reports
当处理不平衡的数据时，预测模型的结果可能会产生大量的假阳性，即nsbr被预测为sbr。因此，在识别出预测的sbr后，我们将生成一个有用的排名bug报告列表。
我们基于集成学习的想法对bug报告进行排序，其中将多个机器学习模型的结果结合起来，以便更好地预测。通过使用FARSEC过滤器或非过滤的来训练数据对连续分类的预测结果进行排序，当预测相同时，按照错误跟踪系统的时间顺序来排序。
For example：according to the prediction results of the farsecsq filter
Step 1: (Sort by prediction in descending order)
当其他过滤器或非过滤的，预测结果是SBR的数量小于farsecsq
Step 2: (Sort by prediction of farsecsq)

三、实验设置

数据集：uses JIRA6 as its bug tracking system
在这里插入图片描述
five machine learning algorithms：

Random Forest, Naive Bayes, Logistic Regression, Multilayer Perceptron and K-Nearest Neighbor.

Performance Measures
在这里插入图片描述
probability of detection pd, probability of false alarm pf, precision, f-measures and g-measures

qq_43771887

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
2021-11-08:软件工程文章阅读：Text Filtering and Ranking for Security Bug Report Prediction

Text Filtering and Ranking for Security Bug Report Prediction文章目录Text Filtering and Ranking for Security Bug Report Prediction前言一、基本问题？二、文章内容1.作用和贡献2.整体框架总结前言Text Filtering and Ranking for Security Bug Report Prediction. IEEE Trans. Software Eng. 45(6
复制链接

扫一扫