基础知识_Scikit-learn 模型评估（三）

自学AI的鲨鱼儿

已于 2024-03-15 23:47:58 修改

阅读量129

点赞数

分类专栏： # NLP_Code 机器学习文章标签： ML

于 2021-01-15 19:09:02 首次发布

本文链接：https://blog.csdn.net/qq_16555103/article/details/112364370

版权

机器学习同时被 2 个专栏收录

26 篇文章 15 订阅

订阅专栏

NLP_Code

21 篇文章 0 订阅

订阅专栏

一、参数选择

1.0、官方文档查看

1.1、Cross-vaildation：验证模型的性能

1.1.1、Computing cross-validated metrics 计算交叉验证指标

1.1.2、Cross validation iterators 交叉验证的迭代器【常用有 K-fold、StratifiedKFold】

1.2、Grid Search：查询模型的最优参数【包含 Cross-vaildation】

1.2.1、Grid Search 重要参数

1.2.1、Grid Search 网格交叉验证代码案例

二、评估指标

2.1、 macroF1 宏平均与 microF1 微平均【micro 微平均 acc == precision == recall == f1 】 f1_score/precision_score/recall_score中micro和macro的区别

2.2、二分类 ROC曲线 AUC值与多分类 ROC曲线AUC值【多分类ROC曲线分为 macro ROC 与 micro ROC】ROC原理介绍及利用python实现二分类和多分类的ROC曲线

数据准备

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import GridSearchCV

model_datas = pd.read_csv('adultTest.csv',sep=',',header="infer")
model_datas.info()

out：
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   age             32561 non-null  int64 
 1   workclass       32561 non-null  object
 2   fnlwgt          32561 non-null  int64 
 3   education       32561 non-null  object
 4   education-num   32561 non-null  int64 
 5   marital-status  32561 non-null  object
 6   occupation      32561 non-null  object
 7   relationship    32561 non-null  object
 8   race            32561 non-null  object
 9   sex             32561 non-null  object
 10  capital-gain    32561 non-null  int64 
 11  capital-loss    32561 non-null  int64 
 12  hours-per-week  32561 non-null  int64 
 13  native-country  32561 non-null  object
 14  class           32561 non-null  object
dtypes: int64(6), object(9)
memory usage: 3.7+ MB

print(model_datas.head())

out:
   age          workclass  fnlwgt  ... hours-per-week  native-country   class
0   39          State-gov   77516  ...             40   United-States   <=50K
1   50   Self-emp-not-inc   83311  ...             13   United-States   <=50K
2   38            Private  215646  ...             40   United-States   <=50K
3   53            Private  234721  ...             40   United-States   <=50K
4   28            Private  338409  ...             40            Cuba   <=50K

[5 rows x 15 columns]

x_data = model_datas.drop(labels='class',axis=1)
y = model_datas['class']

# label 编码
Le = LabelEncoder()
y_new = Le.fit_transform(y)
classes = Le.classes_  # 1D 数组，索引下标为 编码类别  ,  classes[y_new] 可以转化为原来的类别

# one-hot
x_data = pd.get_dummies(data=x_data,columns=['workclass','education','marital-status','occupation',
                                         'relationship','race','sex','native-country'])
print(x_data.head())

out：
   age  fnlwgt  ...  native-country_ Vietnam  native-country_ Yugoslavia
0   39   77516  ...                        0                           0
1   50   83311  ...                        0                           0
2   38  215646  ...                        0                           0
3   53  234721  ...                        0                           0
4   28  338409  ...                        0                           0

[5 rows x 108 columns]

print(y_new)

out：
[0 0 0 ... 0 0 1]

一、参数选择

1.0、官方文档查看

官方文档

1.1、Cross-vaildation：验证模型的性能

1.1.1、Computing cross-validated metrics 计算交叉验证指标

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score

gbc = GradientBoostingClassifier(learning_rate=0.01,n_estimators=50,max_depth=2)
"""
    learning_rate=0.01,     # 学习率
    n_estimators=50,        # 子CART回归树的个数
    max_depth=2             # 子CART回归树的深度
"""

scores_f1_macro = cross_val_score(estimator=gbc,X=x_data,y=y_new,scoring='f1_macro',cv=5,n_jobs=-1)
"""
    estimator,      # 用交叉验证的学习器
    X,              # X 矩阵
    y=None,         # y label
    scoring=None,   # 验证的评估函数  【['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'mutual_info_score', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'v_measure_score']】
                      tip：查看 sklearn的官方手册【https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter】可以知道，f1 函数默认为 binary，如果多分类，常常用 f1_macro【考虑所有类别f1并且求均值】
                      
    cv=None,   # int or cross-validation generator or an iterable 【例如 K-fold、Stratifiled K-fold ... 】
    n_jobs=-1  # -1 表示会使用所有的线程
"""
print(scores_f1_macro)

out:
[0.46670152 0.47061916 0.4692159  0.46991753 0.47243059]

两个重要参数详解：cv 与 scoring
- cv 参数：int or cross-validation generator or an iterable，详情见下文 cross-validation generator

1、cv: int, cross-validation generator or an iterable
    1.1、int
    1.2、其中，cv参数可以传入sklearn中自带的一些cv iterators：
        1.2.1、K-fold
        1.2.2、Stratified k-fold
        1.2.3、Label k-fold
        1.2.4、Leave-One-Out - LOO
        1.2.5、Leave-P-Out - LPO

scoring 参数【不仅仅是 cross-val-score 这个函数，sklearn 中包含 scoring 参数的API 均能使用】官方文档

'accuracy', 
'adjusted_mutual_info_score',
'adjusted_rand_score', 
'average_precision', 
'completeness_score', 
'explained_variance', 
'f1', 
'f1_macro',  # 针对多分类求均值
'f1_micro', 
'f1_samples',
'f1_weighted', 
'fowlkes_mallows_score', 
'homogeneity_score', 
'mutual_info_score', 
'neg_log_loss', 
'neg_mean_absolute_error', 
'neg_mean_squared_error', 
'neg_mean_squared_log_error', 
'neg_median_absolute_error', 
'normalized_mutual_info_score', 
'precision', 
'precision_macro',  # 针对多分类求均值
'precision_micro', 
'precision_samples', 
'precision_weighted', 
'r2', 
'recall', 
'recall_macro',  # 针对多分类求均值
'recall_micro', 
'recall_samples', 
'recall_weighted', 
'roc_auc', 
'v_measure_score'

1.1.2、Cross validation iterators 交叉验证的迭代器【常用有 K-fold、StratifiedKFold】

K-fold 用法

from sklearn.model_selection import KFold

datas = np.array(list(range(2,10)))
y = np.array([0,0,0,0,1,1,1,1])
print(datas)

out:
[2 3 4 5 6 7 8 9]

kf = KFold(n_splits=4)
for train_index,test_index in kf.split(datas,y):
    print('训练集：索引 %s   测试集：索引 %s ' %(train_index,test_index))
    x_train,x_test = datas[train_index],datas[test_index]
    y_train,y_test = y[train_index],y[test_index]

out:
训练集：索引 [2 3 4 5 6 7]   测试集：索引 [0 1] 
训练集：索引 [0 1 4 5 6 7]   测试集：索引 [2 3] 
训练集：索引 [0 1 2 3 6 7]   测试集：索引 [4 5] 
训练集：索引 [0 1 2 3 4 5]   测试集：索引 [6 7]

# 使用 K-fold 的生成器作为 cross_val_score 中 cv参数

from sklearn.model_selection import KFold,cross_val_score
from sklearn.ensemble import GradientBoostingClassifier

kf = KFold(n_splits=4)
gbc = GradientBoostingClassifier(learning_rate=0.01,n_estimators=50,max_depth=2)
scores_f1_macro = cross_val_score(estimator=gbc,X=x_data,y=y_new,scoring='f1_macro',cv=kf,n_jobs=-1)
print(scores_f1_macro)

out:
[0.4682921  0.47002771 0.47175635 0.46945807]

StratifiedKFold

from sklearn.model_selection import StratifiedKFold

datas = np.array(list(range(2,10)))
y = np.array([0,0,0,0,1,1,1,1])
print(datas)

out:
[2 3 4 5 6 7 8 9]

skf = StratifiedKFold(n_splits=4)
for train_index,test_index in skf.split(datas,y):
    print('训练集：索引 %s   测试集：索引 %s ' %(train_index,test_index))
    x_train,x_test = datas[train_index],datas[test_index]
    y_train,y_test = y[train_index],y[test_index]

out:
训练集：索引 [1 2 3 5 6 7]   测试集：索引 [0 4] 
训练集：索引 [0 2 3 4 6 7]   测试集：索引 [1 5] 
训练集：索引 [0 1 3 4 5 7]   测试集：索引 [2 6] 
训练集：索引 [0 1 2 4 5 6]   测试集：索引 [3 7]

1.2、Grid Search：查询模型的最优参数【包含 Cross-vaildation】

1.2.1、Grid Search 重要参数

A search consists of:

an estimator (regressor or classifier such as sklearn.svm.SVC());
a parameter space;
a method for searching or sampling candidates;
a cross-validation scheme; and
a score function.

class sklearn.model_selection.GridSearchCV(estimator, param_grid, scoring=None, n_jobs=None, 
iid='deprecated', 
refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score=nan, return_train_score=False)

"""
参数：
estimator——用什么模型
param__grid——参数字典（key为要寻优的参数名，value为要尝试寻优的值的列表）
scoring——用什么指标来评估（分类器默认用准确率，也可改为'f1'、'roc_auc'等）
cv—— 几折交叉验证（默认5，一般设置5-10， 也可以传入一个KFold或Stratified迭代器，但实际上传入整数默认就是用Stratified迭代器）
n_jobs——开n个进程并行计算，默认为1（建议设置-1，让之并行计算）
verbose——是否要将学习过程打印出来（如0或1或2或3，数字越大，打印信息越详细。但有的模型没有学习的过程，如这个perceptrom）
iid——假设样本是否是独立同分布的（默认是True）
refit——是否需要直接返回在整个训练集上的最佳分类器，默认为True，可直接将这个GridSearchCV实例用于predict
error_score——遇到不合理的参数是否要报错，默认'nan'

"""

1.2.1、Grid Search 网格交叉验证代码案例

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV
gbc = GradientBoostingClassifier(learning_rate=0.01,n_estimators=150,max_depth=4)
param_grid = {'learning_rate':[0.1,1,0.01],
              'n_estimators':[20,50,100],
              'max_depth':[2,3,4],}
gbc_cv = GridSearchCV(estimator=gbc, param_grid=param_grid, scoring='f1_macro', n_jobs=-1,iid='deprecated', cv=5, verbose=1)
gbc_cv.fit(x_data,y_new)

best_score = gbc_cv.best_score_
print(best_score)   # 最高的训练分数

out:
0.809569551196101

best_params = gbc_cv.best_params_
print(best_params)  # 最优的参数

out:
{'learning_rate': 1, 'max_depth': 3, 'n_estimators': 50}

gbc_model = gbc_cv.best_estimator_
print(gbc_model)    # 最好的 estimator 学习器

out:
GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,
                           learning_rate=1, loss='deviance', max_depth=3,
                           max_features=None, max_leaf_nodes=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=1, min_samples_split=2,
                           min_weight_fraction_leaf=0.0, n_estimators=50,
                           n_iter_no_change=None, presort='deprecated',
                           random_state=None, subsample=1.0, tol=0.0001,
                           validation_fraction=0.1, verbose=0,
                           warm_start=False)

二、评估指标

具体可以参考sklearn文档中列明的scoring指标：
3.3. Metrics and scoring: quantifying the quality of predictions — scikit-learn 1.4.1 documentation

例如分类任务可用的scoring指标如下：

2.1、 macroF1 宏平均与 microF1 微平均【micro 微平均 acc == precision == recall == f1 】 f1_score/precision_score/recall_score中micro和macro的区别、

1、macro 计算时：利用混淆矩阵分别计算每一个类别的score [将其他类别都当成负例]，然后进行平均 【各个类别的F1等
    权重均值池化】
2、micro 计算时：利用混淆矩阵通过计算总体的 TP ，FN ，FP 的数量，再计算F1 【micro方法其实就是 acc准确率的计算方式，
    因此 micro 微平均有以下特点： acc == precision == recall == f1 恒成立 】

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import f1_score
gbc = GradientBoostingClassifier(learning_rate=1,n_estimators=50,max_depth=3)
gbc.fit(x_data,y_new)
y_pre = gbc.predict(x_data)
f1_score_1 = f1_score(y_true=y_new,y_pred=y_pre,average='macro')    ...... 所有类别 f1_score 的 avg pooling
"""
y_true, 
y_pred, 
labels=None, 
pos_label=1,         # 设定 pos_label 哪个类别是正例，这个主要是针对二分类问题时，返回哪一个类别作为输出
average='binary',    # binary 【二分类问题】, macro,micro ... 这些事针对多分类问题
sample_weight=None,  # 多分类是加权求均值，具体请看官方手册
zero_division="warn"
"""
print(f1_score_1)

out:
0.8322981046559539

f1_score_2 = f1_score(y_true=y_new,y_pred=y_pre,average='micro')    
print(f1_score_2)

out：
0.8835416602684194

f1_score_3 = f1_score(y_true=y_new,y_pred=y_pre,average=None)  # 计算出所有类别的 f1_score
print(f1_score_3)

out:
[0.925      0.73959621]

f1_score_4 = f1_score(y_true=y_new,y_pred=y_pre,average="binary")   # f1_score 默认是 binary 模型，因此默认只能解决二分类问题; 二分类问题时返回 positive tag 的 f1_score，而 positive tag 由 参数 pos_label 给定，默认为 1 类别。
print(f1_score_4)

out:
0.7395962093119078


from sklearn import metrics
 
y_test    = [1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4]
y_predict = [1, 1, 1, 3, 3, 2, 2, 3, 3, 3, 4, 3, 4, 3]
 
print('准确率:', metrics.accuracy_score(y_test, y_predict)) #预测准确率输出
 
print('宏平均精确率:',metrics.precision_score(y_test,y_predict,average='macro')) #预测宏平均精确率输出
print('微平均精确率:', metrics.precision_score(y_test, y_predict, average='micro')) #预测微平均精确率输出
print('加权平均精确率:', metrics.precision_score(y_test, y_predict, average='weighted')) #预测加权平均精确率输出
 
print('宏平均召回率:',metrics.recall_score(y_test,y_predict,average='macro'))#预测宏平均召回率输出
print('微平均召回率:',metrics.recall_score(y_test,y_predict,average='micro'))#预测微平均召回率输出
print('加权平均召回率:',metrics.recall_score(y_test,y_predict,average='micro'))#预测加权平均召回率输出
 
print('宏平均F1-score:',metrics.f1_score(y_test,y_predict,labels=[1,2,3,4],average='macro'))#预测宏平均f1-score输出
print('微平均F1-score:',metrics.f1_score(y_test,y_predict,labels=[1,2,3,4],average='micro'))#预测微平均f1-score输出
print('加权平均F1-score:',metrics.f1_score(y_test,y_predict,labels=[1,2,3,4],average='weighted'))#预测加权平均f1-score输出
 
print('混淆矩阵输出:\n',metrics.confusion_matrix(y_test,y_predict,labels=[1,2,3,4]))#混淆矩阵输出
print('分类报告:\n', metrics.classification_report(y_test, y_predict,labels=[1,2,3,4]))#分类报告输出
 
 
输出：
准确率: 0.571428571429
宏平均精确率: 0.696428571429
微平均精确率: 0.571428571429
加权平均精确率: 0.775510204082
宏平均召回率: 0.566666666667
微平均召回率: 0.571428571429
加权平均召回率: 0.571428571429
宏平均F1-score: 0.579166666667
微平均F1-score: 0.571428571429
加权平均F1-score: 0.615476190476
混淆矩阵输出:
 [[3 0 2 0]
 [0 2 2 0]
 [0 0 2 1]
 [0 0 1 1]]
分类报告:
              precision    recall  f1-score   support
 
          1       1.00      0.60      0.75         5
          2       1.00      0.50      0.67         4
          3       0.29      0.67      0.40         3
          4       0.50      0.50      0.50         2
 
avg / total       0.78      0.57      0.62        14

2.2、二分类 ROC曲线 AUC值与多分类 ROC曲线AUC值【多分类ROC曲线分为 macro ROC 与 micro ROC】ROC原理介绍及利用python实现二分类和多分类的ROC曲线

二分类问题的 ROC曲线与AUC值，默认是针对 1 类别为正例

# -*- coding: utf-8 -*-

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc  ###计算roc和auc
from sklearn import cross_validation

# Import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target

##变为2分类
X, y = X[y != 2], y[y != 2]

# Add noisy features to make the problem harder
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.c_[X, random_state.randn(n_samples, 200 * n_features)]

# shuffle and split training and test sets
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=.3,random_state=0)

# Learn to predict each class against the other
svm = svm.SVC(kernel='linear', probability=True,random_state=random_state)

###通过decision_function()计算得到的y_score的值，用在roc_curve()函数中
y_score = svm.fit(X_train, y_train).decision_function(X_test)

# Compute ROC curve and ROC area for each class
fpr,tpr,threshold = roc_curve(y_test, y_score) ###计算真正率和假正率
roc_auc = auc(fpr,tpr) ###计算auc的值

plt.figure()
lw = 2
plt.figure(figsize=(10,10))
plt.plot(fpr, tpr, color='darkorange',
         lw=lw, label='ROC curve (area = %0.2f)' % roc_auc) ###假正率为横坐标，真正率为纵坐标做曲线
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")
plt.show()

多分类问题的 ROC曲线与AUC值，micro roc 直接以得到最终均值的 roc曲线，而反观 macro roc 的过程中可以汇出每个类别的 roc 值

# # 绘制roc曲线  # #
y_test_one_hot = label_binarize(y_test_cls, np.arange(3))   # 将标签二值化
y_predict_one_hot = y_logits_cls   #  .decision_function(X_test) 决策函数生成的置信度矩阵

plt.figure()
# 绘图
mpl.rcParams['font.sans-serif'] = u'SimHei'
mpl.rcParams['axes.unicode_minus'] = False
# FPR就是横坐标,TPR就是纵坐标
# 计算ROC
fpr_dict, tpr_dict, roc_auc = dict(), dict(), dict()
for i in range(3):  # 计算每一个标签的假正例率（fpr）和真正例率（tpr）
    fpr_dict[i], tpr_dict[i], _ = roc_curve(y_test_one_hot[:, i], y_predict_one_hot[:, i])
    roc_auc[i] = auc(fpr_dict[i], tpr_dict[i])
# 两种画法：
# 方法一：将所有的标签进行二值化处理后，如[[0,0,1],[0,1,0]] 转成[0,0,1,0,1,0] 转成二分类进行求解
fpr_dict["micro"], tpr_dict["micro"], _ = roc_curve(y_test_one_hot.ravel(),
                                                    y_predict_one_hot.ravel())
roc_auc["micro"] = auc(fpr_dict["micro"], tpr_dict["micro"])

# 方法二： 将每个标签的fpr和tpr进行累加除以种类数，即画出平均后的roc曲面
n_classes = 3
from scipy import interp
all_fpr = np.unique(np.concatenate([fpr_dict[i] for i in range(n_classes)]))
# Then interpolate all ROC curves at this points
mean_tpr = np.zeros_like(all_fpr)
for i in range(n_classes):
    mean_tpr += interp(all_fpr, fpr_dict[i], tpr_dict[i])
# Finally average it and compute AUC
mean_tpr /= n_classes
fpr_dict["macro"] = all_fpr
tpr_dict["macro"] = mean_tpr
roc_auc["macro"] = auc(fpr_dict["macro"], tpr_dict["macro"])
print(roc_auc)


# 显示到当前界面，保存为svm.png
lw = 2
# plt.plot(fpr_dict[2], tpr_dict[2], color='darkorange',  # 画关于正面的roc曲面
#          lw=lw, label='ROC curve (area = %0.3f)' % roc_auc["micro"])
plt.plot(fpr_dict["micro"], tpr_dict["micro"], color='darkorange',
         lw=lw, label='ROC curve (area = %0.3f)' % roc_auc["micro"])
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")
plt.title(u'text_rnnROC和AUC', fontsize=17)
path = os.path.join(file_path, "img")
if not os.path.exists(path): os.makedirs(path)
plt.savefig(os.path.join(file_path, "img", "{}的ROC和AUC.png".format("model_" + str(config.model_num) + "_")))