Bug Report Classification Summary

1.Introduction

The target of the Bug Report Classification(BRC) is that classify the bug report through the software report log to justify whether the report is an anomaly.

2.Explornation
2.1 Data Pre-process

The data pre-process tricks of the BRC tasks can be summaried as follow.

  • Transform all of the letter in the log into lower.
  • Extract the root of every word so that we can compress the vocabulary list. for exampled, [interesting->interest,interesting->interest,interested->interest].And the implementation method can refer the Porter Stemmer algorithm (some reference blogs).
  • Some other method you can try (blog)
2.2 Classification model

Currently, I have tried a lot of methods, so I will list some main algorithms which achieve much better performace. It is need to be stated that some methodology are inspired by the the Lee hongyi courses anomaly detection chapter (link).

  • Think the BRC task as the binary classification task. However, the distribution of the different class maybe such imbalanced.Specially, the number of the anomaly report is quite less. So we should solve the imbalanced class preoblem. In Keras I have tried two motheds, first is setting the class_weight in model.fit() function, so that we can banlance the sampling of two kinds examples. And another method is that we can refer the design of the Cost-sensitive loss function. And we can give the anomaly class more weight in the contribution in the loss function(Attention,in this way,we should not use the balanced sampling and you can draw the conclusion through the experiment.)
  • And in the above method, the output of the neural netword is a real number. And we should set a threshold so that we can divide the result into two class, and you will know that the threshold will become a hyperparameter. And we should use some skill to adjust it. And you can use another way to avoid the trouble of the adjustment of the hyperparameter. You can transform the binary classification into multi-class classification. The only thing you should do is that change the output of the NN into two neuron. And adding a softmax function in the end of the output to get the probability score of two class. And then use the argmax() function to get the result.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值