在机器学习中,常常需要对学习算法性能进行评估,自然需要建立一些评估准则。
参考http://www2.cs.uregina.ca/~dbd/cs831/notes/confusion_matrix/confusion_matrix.html中的描述,一个比较简单的描述是混淆矩阵(confusion matrix)。它是这样定义的:
The entries in the confusion matrix have the following meaning in the context of our study:
- a is the number of correct predictions that an instance is negative,真实负样本被正确分类为负样本的数目
- b is the number of incorrect predictions that an instance is positive,真实负样本被错误分类为正样本的数目
- c is the number of incorrect of predictions that an instance negative, 真实正样本被错误分类为负样本的数目
- d is the number of correct predictions that an instance is positive.真实正样本被正确分类为正样本的数目
Predicted | |||
Negative | Positive | ||
Actual | Negative | a | b |
Positive | c | d |
Several standard terms have been defined for the 2 class matrix:
- The accuracy (AC) is the proportion of the total number of predictions that were correct. It is determined using the equation:分类准确度,就是正负样本分别被正确分类的概率
[1]
- The recall or true positive rate (TP) is the proportion of positive cases that were correctly identified, as calculated using the equation:召回率,就是正样本被识别出的概率
[2]
- The false positive rate (FP) is the proportion of negatives cases that were incorrectly classified as positive, as calculated usingthe equation:虚警率,负样本被错误分为正样本的概率
[3]
- The true negative rate (TN) is defined as the proportion of negatives cases that were classified correctly, as calculated using the equation:
[4]
- The false negative rate (FN) is the proportion of positives cases that were incorrectly classified as negative, as calculated using the equation:漏警率,正样本被错误分为负样本的概率
[5]
- Finally, precision (P) is the proportion of the predicted positive cases that were correct, as calculated using the equation:精确度,即分类结果为正样本的情况真实性程度
[6]
The accuracy determined using equation 1 may not be an adequate performance measure when the number of negative cases is much greater than the number of positive cases (Kubat et al., 1998). Suppose there are 1000 cases, 995 of which are negative cases and 5 of which are positive cases. If the system classifies them all as negative, the accuracy would be 99.5%, even though the classifier missed all positive cases.
上一段话的意思是:使用accuracy评估分类器,效果可能不那么好,特别是负样本数目占较大比例时。
Other performance measures account for this by including TP in a product: for example, geometric mean (g-mean) (Kubat et al., 1998), as defined in equations 7 and 8, and F-Measure (Lewis and Gale, 1994), as defined in equation 9.
[7]
[8]
[9]
In equation 9, b has a value from 0 to infinity and is used to control the weight assigned to TP and P. Any classifier evaluated using equations 7, 8 or 9 will have a measure value of 0, if all positive cases are classified incorrectly.