ML之sklearn:sklearn.metrics中confusion_matrix函数、make_scorer函数解读、案例应用之详细攻略
目录
sklearn.metrics.confusion_matrix函数
sklearn.metrics.make_scorer()函数
推荐文章
ML:分类预测问题中评价指标(ER/混淆矩阵P-R-F1/ROC-AUC/RP/mAP)简介、使用方法、代码实现、案例应用之详细攻略
CNN之性能指标:卷积神经网络中常用的性能指标(IOU/AP/mAP、混淆矩阵)简介、使用方法之详细攻略
sklearn.metrics中常用的函数参数
sklearn.metrics.confusion_matrix函数
函数解释
返回值:混淆矩阵,其第i行和第j列条目表示真实标签为第i类、预测标签为第j类的样本数。
预测 0 1 真实 0 1
def confusion_matrix Found at: sklearn.metrics._classification @_deprecate_positional_args | 在:sklear. metrics._classification找到的def confusion_matrix @_deprecate_positional_args
|
Examples -------- >>> from sklearn.metrics import confusion_matrix >>> y_true = [2, 0, 2, 2, 0, 1] >>> y_pred = [0, 0, 2, 2, 0, 2] >>> confusion_matrix(y_true, y_pred) array([[2, 0, 0], [0, 0, 1], [1, 0, 2]]) >>> y_true = ["cat", "ant", "cat", "cat", "ant", "bird"] >>> y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"] >>> confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"]) array([[2, 0, 0], [0, 0, 1], [1, 0, 2]]) In the binary case, we can extract true positives, etc as follows: >>> tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel() >>> (tn, fp, fn, tp) (0, 2, 1, 1) | |
""" y_type, y_true, y_pred = _check_targets(y_true, y_pred) if y_type not in ("binary", "multiclass"): raise ValueError("%s is not supported" % y_type) if labels is None: labels = unique_labels(y_true, y_pred) else: labels = np.asarray(labels) n_labels = labels.size if n_labels == 0: raise ValueError("'labels' should contains at least one label.") elif y_true.size == 0: return np.zeros((n_labels, n_labels), dtype=np.int) elif np.all([l not in y_true for l in labels]): raise ValueError("At least one label specified must be in y_true") if sample_weight is None: sample_weight = np.ones(y_true.shape[0], dtype=np.int64) else: sample_weight = np.asarray(sample_weight) check_consistent_length(y_true, y_pred, sample_weight) if normalize not in ['true', 'pred', 'all', None]: raise ValueError("normalize must be one of {'true', 'pred', " "'all', None}") n_labels = labels.size label_to_ind = {y:x for x, y in enumerate(labels)} # convert yt, yp into index y_pred = np.array([label_to_ind.get(x, n_labels + 1) for x in y_pred]) y_true = np.array([label_to_ind.get(x, n_labels + 1) for x in y_true]) # intersect y_pred, y_true with labels, eliminate items not in labels ind = np.logical_and(y_pred < n_labels, y_true < n_labels) y_pred = y_pred[ind] y_true = y_true[ind] # also eliminate weights of eliminated items sample_weight = sample_weight[ind] # Choose the accumulator dtype to always have high precision if sample_weight.dtype.kind in {'i', 'u', 'b'}: dtype = np.int64 else: dtype = np.float64 cm = coo_matrix((sample_weight, (y_true, y_pred)), shape=(n_labels, n_labels), dtype=dtype).toarray() with np.errstate(all='ignore'): if normalize == 'true': cm = cm / cm.sum(axis=1, keepdims=True) elif normalize == 'pred': cm = cm / cm.sum(axis=0, keepdims=True) elif normalize == 'all': cm = cm / cm.sum() cm = np.nan_to_num(cm) return cm |
sklearn.metrics.make_scorer()函数
函数的解读
def make_scorer(score_func, *, greater_is_better=True, needs_proba=False, needs_threshold=False, **kwargs): Read more in the :ref:`User Guide <scoring>`. | 根据性能指标或损失函数制作记分员。这个工厂函数包装了评分函数,用于:class:`~sklearn.model_selection。GridSearchCV '和:func: ' ~sklearn.model_selection.cross_val_score '。它接受一个评分函数,例如:func:`~sklearn.metrics。accuracy_score ~ sklearn.metrics, func:。mean_squared_error ~ sklearn.metrics, func:。adjuststed_rand_index '或:func: ' ~sklearn.metrics。并返回一个可调用对象,对估计器的输出进行评分。调用的签名是' (estimator, X, y) ',其中' estimator '是要评估的模型,' X '是数据,' y '是基本真理标记(在无监督模型的情况下为' None ')。阅读更多:参考:“用户指南”。 |
Parameters greater_is_better : bool, default=True needs_proba : bool, default=False. Whether score_func requires predict_proba to get probability estimates out of a classifier. If True, for binary `y_true`, the score function is supposed to accept a 1D `y_pred` (i.e., probability of the positive class, shape `(n_samples,)`). needs_threshold : bool, default=False. Whether score_func takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method. If True, for binary `y_true`, the score function is supposed to accept a 1D `y_pred` (i.e., probability of the positive class or the decision function, shape `(n_samples,)`). For example ``average_precision`` or the area under the roc curve can not be computed using discrete predictions alone. **kwargs : additional arguments. Additional parameters to be passed to score_func. Returns | 参数 ---------- score_func:可调用的。分数函数(或损失函数)带有签名“score_func(y, y_pred, **kwargs)”。greater_is_better: bool, default=True score_func是一个得分函数(默认),表示高是好,还是一个损失函数,表示低是好。在后一种情况下,scorer对象将对score_func的结果进行符号翻转。needs_proba: bool, default=False。score_func是否需要predict_proba从分类器中获得概率估计。如果为True,对于二进制' y_true ', score函数应该接受1D ' y_pred '(即,正类的概率,形状' (n_samples,) ')。 needs_threshold : bool, default=False。score_func是否具有连续的决策确定性。这只适用于使用具有decision_function或predict_proba方法的估计器进行二进制分类。如果为True,对于二进制' y_true ',得分函数应该接受1D ' y_pred '(即,正类或决策函数shape ' (n_samples,) '的概率)。例如,“average_precision”或roc曲线下的面积不能单独使用离散预测来计算。**kwargs:附加参数。要传递给score_func的附加参数。 返回 ------ scorer :可调用。返回标量分数的可调用对象;越大越好。 |
Examples Notes |
函数案例应用
结合log_transfer函数使用
from sklearn.metrics import make_scorer,mean_absolute_error,r2_score
def log_transfer(func):
def wrapper(y, y_hat):
result = func(np.log(y), np.nan_to_num(np.log(y_hat)))
return result
return wrapper
cv_scores = cross_val_score(LiR_Model, X=X_train, y=y_train, verbose=1, cv=5, scoring=make_scorer(log_transfer(r2_score)))