Bug Report Classification Summary

最新推荐文章于 2021-07-05 05:03:07 发布

Saul Zhang

最新推荐文章于 2021-07-05 05:03:07 发布

阅读量301

点赞数

分类专栏：深度学习笔记文章标签： Bug report classification

本文链接：https://blog.csdn.net/qq_37053885/article/details/98039360

版权

深度学习笔记专栏收录该内容

28 篇文章 2 订阅

订阅专栏

1.Introduction

The target of the Bug Report Classification(BRC) is that classify the bug report through the software report log to justify whether the report is an anomaly.

2.Explornation

2.1 Data Pre-process

The data pre-process tricks of the BRC tasks can be summaried as follow.

Transform all of the letter in the log into lower.
Extract the root of every word so that we can compress the vocabulary list. for exampled, [interesting->interest,interesting->interest,interested->interest].And the implementation method can refer the Porter Stemmer algorithm (some reference blogs).
Some other method you can try (blog)

2.2 Classification model

Currently, I have tried a lot of methods, so I will list some main algorithms which achieve much better performace. It is need to be stated that some methodology are inspired by the the Lee hongyi courses anomaly detection chapter (link).

Think the BRC task as the binary classification task. However, the distribution of the different class maybe such imbalanced.Specially, the number of the anomaly report is quite less. So we should solve the imbalanced class preoblem. In Keras I have tried two motheds, first is setting the class_weight in model.fit() function, so that we can banlance the sampling of two kinds examples. And another method is that we can refer the design of the Cost-sensitive loss function. And we can give the anomaly class more weight in the contribution in the loss function(Attention,in this way,we should not use the balanced sampling and you can draw the conclusion through the experiment.)
And in the above method, the output of the neural netword is a real number. And we should set a threshold so that we can divide the result into two class, and you will know that the threshold will become a hyperparameter. And we should use some skill to adjust it. And you can use another way to avoid the trouble of the adjustment of the hyperparameter. You can transform the binary classification into multi-class classification. The only thing you should do is that change the output of the NN into two neuron. And adding a softmax function in the end of the output to get the probability score of two class. And then use the argmax() function to get the result.

Saul Zhang

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Bug Report Classification Summary

1.IntroductionThe target of the Bug Report Classification(BRC) is that classify the bug report through the software report log to justify whether the report is a anomaly.2.Explornation2.1 Data Pre-...
复制链接

扫一扫

专栏目录