模型的性能度量
我们需要比较两个分类模型和。他们在10个二类(+或-)样本所组成的测试集上的分类结果如下表格中所示。假设我们更关心正样本是否能被正确检测。
Instance | True Class | Scores from | Scores from |
1 | + | 0.73 | 0.61 |
2 | + | 0.69 | 0.03 |
3 | - | 0.44 | 0.68 |
4 | - | 0.55 | 0.31 |
5 | - | 0.67 | 0.45 |
6 | + | 0.47 | 0.09 |
7 | - | 0.08 | 0.38 |
8 | - | 0.15 | 0.05 |
9 | + | 0.45 | 0.01 |
10 | - | 0.35 | 0.04 |
(1)对于分类模型M1,取阈值为0.5,分别计算分类准确率(accuracy)、查准率(precision)、查全率(recall,又称真正例率,true positive rate,TPR)、假正例率(false positive rate,FPR)和F-measure;
(2)对于分类模型M2,取阈值为0.5,分别计算分类准确率(accuracy)、查准率(precision)、查全率(recall,又称真正例率,true positive rate,TPR)、假正例率(false positive rate,FPR)和F-measure;并与分类模型比较,分析哪个分类模型在这个测试集上表现更好;
(3)对于分类模型M1,取阈值为0.2,分别计算分类准确率(accuracy)、查准率(precision)、查全率(recall,又称真正例率,true positive rate,TPR)、假正例率(false positive rate,FPR)和F-measure;并讨论当阈值为0.2或0.5时,哪个分类模型M1的分类结果哪个更好;
(4)试讨论是否存在更好的阈值;若存在,请求出最优阈值并说明原因。
答:
(1)
class | - | - | - | - | + | + | - | - | + | + |
| 0.08 | 0.15 | 0.35 | 0.44 | 0.45 | 0.47 | 0.55 | 0.67 | 0.69 | 0.73 |
TP | 2 | |||||||||
FP | 2 | |||||||||
TN | 4 | |||||||||
FN | 2 | |||||||||
accuracy | 0.6 | |||||||||
precision | 0.5 | |||||||||
TPR(recall) | 0.5 | |||||||||
FPR | 1/3 | |||||||||
F-measure | 0.5 |
(2)
class | + | + | - | - | + | - | - | - | + | - |
| 0.01 | 0.03 | 0.04 | 0.05 | 0.09 | 0.31 | 0.38 | 0.45 | 0.61 | 0.68 |
TP | 1 | |||||||||
FP | 1 | |||||||||
TN | 5 | |||||||||
FN | 3 | |||||||||
accuracy | 0.6 | |||||||||
precision | 0.5 | |||||||||
TPR(recall) | 0.25 | |||||||||
FPR | 1/6 | |||||||||
F-measure | 1/3 |
TPR,M1>M2,分类模型M1在这个测试集上表现得更好。
(3)
class | - | - | - | - | + | + | - | - | + | + |
| 0.08 | 0.15 | 0.35 | 0.44 | 0.45 | 0.47 | 0.55 | 0.67 | 0.69 | 0.73 |
TP | 4 | |||||||||
FP | 4 | |||||||||
TN | 2 | |||||||||
FN | 0 | |||||||||
accuracy | 0.6 | |||||||||
precision | 0.5 | |||||||||
TPR(recall) | 1 | |||||||||
FPR | 2/3 | |||||||||
F-measure | 2/3 |
TPR=1,阈值为0.2时结果更好。
(4)
对于模型M1,
class | - | - | - | - | + | + | - | - | + | + |
|
Threshold>= | 0.08 | 0.15 | 0.35 | 0.44 | 0.45 | 0.47 | 0.55 | 0.67 | 0.69 | 0.73 | 1.0 |
TP | 4 | 4 | 4 | 4 | 4 | 3 | 2 | 2 | 2 | 1 | 0 |
FP | 6 | 5 | 4 | 3 | 2 | 2 | 2 | 1 | 0 | 0 | 0 |
TN | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 5 | 6 | 6 | 6 |
FN | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 2 | 2 | 3 | 4 |
accuracy | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.7 | 0.6 | 0.7 | 0.8 | 0.7 | 0.6 |
precision |
|
|
|
|
|
|
|
|
|
|
|
TPR | 1 | 1 | 1 | 1 | 1 | 0.75 | 0.5 | 0.5 | 0.5 | 0.25 | 0 |
FPR | 1 | 5/6 | 2/3 | 0.5 | 1/3 | 1/3 | 1/3 | 1/6 | 0 | 0 | 0 |
F-measure |
|
|
|
|
|
|
|
|
|
|
|
阈值取0.45时最优,此时accuracy = 0.8, TPR=1,FPR=1/3.
对于模型M2,
class | + | + | - | - | + | - | - | - | + | - |
Threshold>= | 0.01 | 0.03 | 0.04 | 0.05 | 0.09 | 0.31 | 0.38 | 0.45 | 0.61 | 0.68 |
TP | 4 | 3 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 0 |
FP | 6 | 6 | 6 | 5 | 4 | 4 | 3 | 2 | 1 | 1 |
TN | 0 | 0 | 0 | 1 | 2 | 2 | 3 | 4 | 5 | 5 |
FN | 0 | 1 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 4 |
accuracy | 0.4 | 0.3 | 0.2 | 0.3 | 0.4 | 0.3 | 0.4 | 0.5 | 0.6 | 0.5 |
precision |
|
|
|
|
|
|
|
|
|
|
TPR(recall) | 1 | 0.75 | 0.5 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | 0.25 | 0 |
FPR | 1 | 1 | 1 | 5/6 | 2/3 | 2/3 | 0.5 | 1/3 | 1/6 | 1/6 |
F-measure |
|
|
|
|
|
|
|
|
|
|
阈值取0.61时最优,此时accuracy = 0.6, TPR = 0.25, FPR = 1/6.