【机器学习】对于评估指标的理解

X.嘻嘻哈哈

已于 2024-08-01 16:31:14 修改

阅读量1k

点赞数 10

分类专栏：机器学习文章标签：机器学习人工智能

于 2024-08-01 16:27:30 首次发布

本文链接：https://blog.csdn.net/weixin_44702519/article/details/140850988

版权

机器学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

常见的评估指标有哪些？

分类指标

[准确率（）](###准确率 （Accuracy）)
精确率（P）
召回率（R）
F1-Score（F1）
ROC曲线
AUC（AUC）
对数损失（log loss）
K精确率（P@k）
K平均精确率（AP@k）
K均值平均精确率

回归指标

平均绝对误差（MAE）
均方误差（MSE）
均方根误差（RMSE）
均方根对数误差（RMSLE）
平均百分比误差（MPE）
平均绝对百分比误差（MAPE）

分类指标详解

二元分类任务中，当正负样本的数量相等时，通常会使用准确率，精确率，召回率和F1，这些指标均与TP、TN、FP、FN有关

真阳性（TP）：预测为阳，真实为阳
真阴性（TN）：预测为阴，真实为阴
假阳性（FP）：预测为阳，真实为阴
假阴性（FN）：预测为阴，真实为阳

def true_positive(y_true, y_pred):
    # 初始化真阳性样本计数器
    tp = 0
    # 遍历y_true，y_pred中所有元素
    for yt, yp in zip(y_true, y_pred):
    # 若真实标签为正类且预测标签也为正类，计数器增加
    if yt = 1 and yp = 1:
    tp += 1
    # 返回真阳性样本数
    return tp

def true_negative(y_true, y_pred):
    # 初始化真阴性样本计数器
    tn = 0
    # 遍历y_true，y_pred中所有元素
    for yt, yp in zip(y_true, y_pred):
    # 若真实标签为负类且预测标签也为负类，计数器增加
    if yt = 0 and yp = 0:
    tn += 1
    # 返回真阴性样本数
    return tn

def false_positive(y_true, y_pred):
    # 初始化假阳性计数器
    fp = 0
    # 遍历y_true，y_pred中所有元素
    for yt, yp in zip(y_true, y_pred):
    # 若真实标签为负类而预测标签为正类，计数器增加
    if yt = 0 and yp = 1:
    fp += 1
    # 返回假阳性样本数
    return fp

def false_negative(y_true, y_pred):
    # 初始化假阴性计数器
    fn = 0
    # 遍历y_true，y_pred中所有元素
    for yt, yp in zip(y_true, y_pred):
    # 若真实标签为正类而预测标签为负类，计数器增加
    if yt = 1 and yp = 0:
    fn += 1
    # 返回假阴性数
    return fn

准确率 `（Accuracy）`

预测正确的样本，占总样本数的比例。该指标会样本均衡性的影响较大。在样本不均衡是优先考虑其他指标

$\frac {TP + TN} {TP + TN + FP + FN}$

def Accuracy(y_true, y_pred):
    # 真阳性样本数
    tp = true_positive(y_true, y_pred)
    # 假阳性样本数
    fp = false_positive(y_true, y_pred)
    # 假阴性样本数
    fn = false_negative(y_true, y_pred)
    # 真阴性样本数
    tn = true_negative(y_true, y_pred)
    # 准确率
    accuracy_score = (tp + tn) / (tp + tn + fp + fn)
    return accuracy_score

精确率`（Precision）`

预测正确的样本中，目标样本的比例。该指标在样本不均衡时一样适用。精确率低，说明误报率就很高(假阳性很多)，对于医学任务，我们希望误报尽可能少，即精确率要尽可能高。

$\frac {TP} {TP + FP}$

def precision(y_true, y_pred):
    # 真阳性样本数
    tp = true_positive(y_true, y_pred)
    # 假阳性样本数
    fp = false_positive(y_true, y_pred)
    # 精确率
    precision = tp / (tp + fp)
    return precision

召回率`（Recall）`

目标样本中，预测正确的比例。该指标同样适用于样本不均衡的情况。医学任务中,FN越少越好. 因为我们不想在病人有病(阳性)的情况下,被误诊为无病(阴性),

$\frac {TP} {TP + FN}$

def recall(y_true, y_pred):
    # 真阳性样本数
    tp = true_positive(y_true, y_pred)
    # 假阴性样本数
    fn = false_negative(y_true, y_pred)
    # 召回率
    recall = tp / (tp + fn)
    return recall

`Precision - Recall` 曲线

大多数模型都会在输出预测概率后,设置一个阈值,所有某一个样本对于某一类的预测值大于阈值,则该样本被预测为该类. 这个阈值可以设置很多个.对于基于每一个阈值的预测结果,求一组P(Precision)和 R (Recall). 这样就获得了P 列表和R 列表.从而绘制PR曲线.

Citation : 对于一个好模型而言, 精确率与召回率都应该很高。精确率与召回率的范围都是从0到1,越接近1越好

precisions = []
recalls = []
thresholds = [0.0490937 , 0.05934905, 0.079377,
            0.08584789, 0.11114267, 0.11639273,
            0.15952202, 0.17554844, 0.18521942,
            0.27259048, 0.31620708, 0.33056815,
            0.39095342, 0.61977213]  # 阈值需要根据实际情况进行确定

# 遍历预测阈值
for i in thresholds:
    # 若样本为正类（1）的概率大于阈值，为1，否则为0
    temp_prediction = [1 if x = i else 0 for x in y_pred]
    # 计算精确率
    p = precision(y_true, temp_prediction)
    # 计算召回率
    r = recall(y_true, temp_prediction)
    # 加入精确率列表
    precisions.append(p)
    # 加入召回率列表
    recalls.append(r)

# 创建画布
plt.figure(figsize=(7, 7))
# x轴为召回率，y轴为精确率
plt.plot(recalls, precisions)
# 添加x轴标签，字体大小为15
plt.xlabel('Recall', fontsize=15)
# 添加y轴标签，字条大小为15
plt.ylabel('Precision', fontsize=15)

`F1-Score`

F1-Score是精确率与召回率的综合指标.其范围也是从0到1,完美预测模型的F1分数为1

$F1_{score} = 2\times \frac {Precision \times Recall} {Precision+Recall}$

def f1(y_true, y_pred):
    # 计算精确率
    p = precision(y_true, y_pred)
    # 计算召回率
    r = recall(y_true, y_pred)
    # 计算f1值
    score = 2 * p * r / (p + r)
    return score

`ROC`曲线

ROC曲线是真阳性率(TPR)与假阳性率(FPR)基于阈值列表的的曲线.

真阳性率(TPR):

$=\frac {TP} {TP + FN}$

def tpr(y_true, y_pred):
    # 真阳性率（TPR），与召回率计算公式一致
    return recall(y_true, y_pred)

假阳性率(FPR)

$\frac {FP} {TN + FP}$

def fpr(y_true, y_pred):
    # 假阳性样本数
    fp = false_positive(y_true, y_pred)
    # 真阴性样本数
    tn = true_negative(y_true, y_pred)
    # 返回假阳性率（FPR）
    return fp / (tn + fp)

# 初始化真阳性率列表
tpr_list = []
# 初始化假阳性率列表
fpr_list = []
# 真实样本标签
y_true = [0, 0, 0, 0, 1, 0, 1,
        0, 0, 1, 0, 1, 0, 0, 1]
# 预测样本为正类（1）的概率
y_pred = [0.1, 0.3, 0.2, 0.6, 0.8, 0.05,
        0.9, 0.5, 0.3, 0.66, 0.3, 0.2,
        0.85, 0.15, 0.99]
# 预测阈值
thresholds = [0, 0.1, 0.2, 0.3, 0.4, 0.5,
            0.6, 0.7, 0.8, 0.85, 0.9, 0.99, 1.0]

# 遍历预测阈值
for thresh in thresholds:
    # 若样本为正类（1）的概率大于阈值，为1，否则为0
    temp_pred = [1 if x = thresh else 0 for x in y_pred]
    # 真阳性率
    temp_tpr = tpr(y_true, temp_pred)
    # 假阳性率
    temp_fpr = fpr(y_true, temp_pred)
    # 将真阳性率加入列表
    tpr_list.append(temp_tpr)
    # 将假阳性率加入列表
    fpr_list.append(temp_fpr)

AUC

AUC 是指ROC曲线下面积.

AUC = 1: 完美的模型. ⼤多数情况下，这意味着你在验证时犯了⼀些错误，应该重新审视数据处理和验证流程。

AUC = 0: 模型非常糟糕(或非常好). 这种 AUC 也可能意味着您的验证或数据处理存在问题

AUC = 0.5: 意味着你的预测是随机的。因此，对于任何⼆元分类问题，如果我将所有⽬标都预测为 0.5，我将得到 0.5的 AUC。

AUC 值介于 0 和 0.5 之间，意味着你的模型⽐随机模型更差。⼤多数情况下，这是因为你颠倒了类别。如果您尝试反转预测，您的 AUC 值可能会超过 0.5。接近 1 的 AUC 值被认为是好值。

In [X]: from sklearn import metrics
In [X]: y_true = [0, 0, 0, 0, 1, 0, 1,
.: 0, 0, 1, 0, 1, 0, 0, 1]
In [X]: y_pred = [0.1, 0.3, 0.2, 0.6, 0.8, 0.05,
.: 0.9, 0.5, 0.3, 0.66, 0.3, 0.2,
.: 0.85, 0.15, 0.99]
In [X]: metrics.roc_auc_score(y_true, y_pred)
Out[X]: 0.8300000000000001

对数损失

对于数据集中的多个样本，所有样本的对数损失只是所有单个对数损失的平均值。需要记住的⼀点是，对数损失会对不正确或偏差较⼤的预测进⾏相当⾼的惩罚，也就是说，对数损失会对⾮常确定和⾮常错误的预测进⾏惩罚。

$\times (target \times log (prediction) + (1 - target) \times log(1-precision))$

import numpy as np
def log_loss(y_true, y_proba):
    # 极小值，防止0做分母
    epsilon = 1e-15
    # 对数损失列表
    loss = []
    # 遍历y_true，y_pred中所有元素
    for yt, yp in zip(y_true, y_proba):
    # 限制yp范围，最小为epsilon，最大为1-epsilon
    yp = np.clip(yp, epsilon, 1 - epsilon)
    # 计算对数损失
    temp_loss = - 1.0 * (yt * np.log(yp)+ (1 - yt) * np.log(1 - yp))
    # 加入对数损失列表
    loss.append(temp_loss)
    return np.mean(loss)