tf.keras.metrics.AUC
tensorflow2.*常用
tf.keras.metrics.AUC(
num_thresholds=200, curve='ROC',
summation_method='interpolation', name=None, dtype=None,
thresholds=None, multi_label=False, num_labels=None, label_weights=None,
from_logits=False
)
此度量标准创建用于计算AUC的四个局部变量 true_positives , true_negatives , false_positives 和 false_negatives 。为了离散化AUC曲线,使用一组线性间隔的阈值来计算成对的召回率和精度值。因此,ROC曲线下方的面积是通过假阳性率使用召回值的高度来计算的,而PR曲线下方的面积是通过召回率使用精度值的高度来计算的。
该值最终以 auc 形式返回,这是一个幂等运算,用于计算精度与召回值(使用上述变量计算)的离散曲线下的面积。
num_thresholds 变量控制离散化具有较大的阈值更紧密地逼近真实AUC的数量程度。近似的质量可能会因 num_thresholds 而有很大差异。
thresholds 参数可用于手动指定阈值,这些阈值可以更均匀地划分预测。
sample_weight 为 None ,则权重默认为1。使用 sample_weight 为0掩盖值。
为了获得最佳结果, predictions 应该在[0,1]范围内大致均匀分布,并且不要在0或1附近达到峰值。如果不是这种情况,则AUC近似值的质量可能很差。将 summation_method 设置为“ minoring”或“ majoring”可以通过提供AUC的下限或上限估算值来帮助量化近似值中的误差。
[https://runebook.dev/zh-CN/docs/tensorflow/keras/metrics/auc]
tf.metrics.AUC
tensorflow1.*常用
tf.metrics.auc(
labels, predictions, weights=None, num_thresholds=200, metrics_collections=None,
updates_collections=None, curve='ROC', name=None,
summation_method='trapezoidal', thresholds=None
)
[Module: tf.keras.metrics]
使用tf.estimator时,如果调用 Estimator 的 evaluate 方法,则 model_fn 会收到 mode = ModeKeys.EVAL。在这种情况下,模型函数必须返回一个包含模型损失和一个或多个指标(可选)的 tf.estimator.EstimatorSpec。虽然返回指标是可选的,但大多数自定义 Estimator 至少会返回一个指标。TensorFlow 提供一个指标模块 tf.metrics 来计算常用指标。
几个常用的指标
这些可能只针对二分类
文档表示标签和预测都将转换为bool,因此它只涉及二进制分类。也许有可能对这些例子进行热门编码,它会起作用吗?但不确定这一点。[Tensorflow中多类分类的类精度和召回率?]
accuracy(...)
: Calculates how often predictions
matches labels
.
The accuracy
function creates two local variables, total
and count
that are used to compute the frequency with which predictions
matches labels
. This frequency is ultimately returned as accuracy
: an idempotent operation that simply divides total
by count
.
auc(...)
: Computes the approximate AUC via a Riemann sum.
average_precision_at_k(...)
: Computes average precision@k of predictions with respect to sparse labels.
precision(...)
: Computes the precision of the predictions with respect to the labels. 准确率。tf.metrics.accuracy
函数会将我们的预测值与真实值进行比较,即与输入函数提供的标签进行比较。tf.metrics.accuracy
函数要求标签和预测具有相同的形状。
precision_at_k(...)
: Computes precision@k of the predictions with respect to sparse labels.
recall(...)
: Computes the recall of the predictions with respect to the labels.
recall_at_k(...)
: Computes recall@k of the predictions with respect to sparse labels.
[评估]
初始化
这些函数创建的都是local variables,直接初始化时需要使用sess.run(tf.local_variables_initializer())而不是tf.global_variables_initializer()。不初始化可能出错:Attempting to use uninitialized value total_confusion_matrix。
参数
1 如果输出的是序列label(如ner模型),则一般需要使用mask。[Tensorflow:tensor变换]
2 对于分类模型,
2.1 计算precission、recall时,pred_ids需要是one-hot形式,如
labels = [[0, 1, 0],
[1, 0, 0],
[0, 0, 1]],
[tensorflow – 如何正确使用tf.metrics.accuracy?]
note:
1 当然对比的pred_ids不能是有负值的logits,否则出错[`predictions` contains negative values] # [Condition x >= 0 did not hold element-wise:] [x (Reshape_2:0) = ] [0 -6 3...]。
2 非要改成非one-hot形式,如果argmax维度搞错没写或0,输入(batch_size, num_labels),输出本应是(batch_size,),变成了输出(num_labels,),一般如果num_labels>batch_size不会报错,<则报错“(batch_size, num_labels) tf_metircs [`labels` out of bound] [Condition x < y did not hold element-wise:]”,但是两者都是错误的。
2.2 计算acc、auc(这个不清楚原理)时则不需要这种转换,直接输入即可。
多类分类的测试
计算precission、recall时,pred_ids需要是one-hot形式,如
labels = [[0, 1, 0],
[1, 0, 0],
[0, 0, 1]],
经大规模测试,发现其计算实际上是micro平均,即precission=recall=acc;同时自带的这种等价于使用下面提到的多分类指标评价tf.metrics.accuracy(labels=labels, predictions=pred_ids)等价于tf_metrics.accuracy(labels=tf.argmax(labels, 1), predictions=tf.argmax(pred_ids,1))。
返回值
以accuracy的返回值为例:
accuracy: A Tensor representing the accuracy, the value of total divided by count. 准确性调用不会使用新输入更新度量标准,它只使用两个局部变量返回值。(具体意思看示例1就ok了)
update_op: An operation that increments the total and count variables appropriately and whose value matches accuracy.
Multi-class metrics for Tensorflow: tf_metrics
precision(labels, predictions, num_classes, pos_indices=None, weights=None, average='micro'):
参数:
labels : Tensor of tf.int32 or tf.int64
The true labels 输入为shape=(batch,)的非one-hot的labels列表。
predictions : Tensor of tf.int32 or tf.int64
The predictions, same shape as labels
num_classes : int
The number of classes
pos_indices : list of int, optional
The indices of the positive classes, default is all
weights : Tensor of tf.int32, optional
Mask, must be of compatible shape with labels
average : str, optional
'micro': counts the total number of true positives, false
positives, and false negatives for the classes in
`pos_indices` and infer the metric from it.
'macro': will compute the metric separately for each class in
`pos_indices` and average. Will not account for class
imbalance.
'weighted': will compute the metric separately for each class in
`pos_indices` and perform a weighted average by the total
number of true labels for each class.
recall(labels, predictions, num_classes, pos_indices=None, weights=None, average='micro')
f1(labels, predictions, num_classes, pos_indices=None, weights=None, average='micro')
输入如果是one-hot形式,需要转换成预测标签类别
acc, acc_op = tf_metrics.accuracy(labels=tf.argmax(labels, 1), predictions=tf.argmax(logits,1))
示例
示例1
label_ids = tf.constant([[3, 1, 5]])
pred_ids = tf.constant([[3, 2, 5]])
acc, acc_op = tf.metrics.accuracy(label_ids, pred_ids)
stream_vars = [i for i in tf.local_variables()]
print(stream_vars)
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
print('[total, count]:', sess.run(stream_vars))
print(acc.eval()) # 只使用两个局部变量(此时未更新为0)返回值
print(acc_op.eval())
print('[total, count]:', sess.run(stream_vars))
print(acc.eval()) # 只使用两个局部变量(此时已更新非0)返回值[<tf.Variable 'accuracy/total:0' shape=() dtype=float32_ref>, <tf.Variable 'accuracy/count:0' shape=() dtype=float32_ref>]
[total, count]: [0.0, 0.0]
0.0
0.6666667
[total, count]: [2.0, 3.0]
0.6666667
[tensorflow – 如何正确使用tf.metrics.accuracy?]
[深入理解TensorFlow中的tf.metrics算子]
示例2
# Compute evaluation metrics.
acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(labels, 1), predictions=tf.argmax(logits,1))
示例3:多分类
label_ids = tf.constant([[0, 0, 0, 1],
[0, 0, 1, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0]])
pred_ids = tf.constant([[0, 0, 0, 1],
[0, 1, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0]])
num_labels = label_ids.shape[1]
label_arg_ids = tf.argmax(label_ids, 1)
pred_arg_ids = tf.argmax(pred_ids, 1)
# _, tp_op = tf.metrics.true_positives(label_ids, pred_ids)
# _, fp_op = tf.metrics.false_positives(label_ids, pred_ids)
_, acc_op = tf.metrics.precision(label_ids, pred_ids)
_, acc_op1 = tf.metrics.accuracy(label_arg_ids, pred_arg_ids)
_, pre_op = tf.metrics.precision(label_ids, pred_ids)
# _, pre_op1 = tf.metrics.precision(label_arg_ids, pred_arg_ids)
_, rec_op = tf.metrics.recall(label_ids, pred_ids)
# _, rec_op1 = tf.metrics.recall(label_arg_ids, pred_arg_ids)
# _, pre_op_ = tf_metrics.precision(label_ids, pred_ids, num_labels)
_, pre_op1_ = tf_metrics.precision(label_arg_ids, pred_arg_ids, num_labels, average='macro')
# _, rec_op_ = tf_metrics.recall(label_ids, pred_ids, num_labels)
_, rec_op1_ = tf_metrics.recall(label_arg_ids, pred_arg_ids, num_labels, average='macro')
_, f1_op1_ = tf_metrics.f1(label_arg_ids, pred_arg_ids, num_labels, average='macro')
stream_vars = [i for i in tf.local_variables()]
print(stream_vars)
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
print(label_arg_ids.eval())
print(pred_arg_ids.eval())
# print(tp_op.eval()) # 2
# print(fp_op.eval()) # 3
print('acc_op:', acc_op.eval())
print('acc_op1:', acc_op1.eval())
print('pre_op:', pre_op.eval())
# print('pre_op1:', pre_op1.eval()) # 1.0
print('rec_op:', rec_op.eval())
# print('rec_op1:', rec_op1.eval()) # 0.5
# print(pre_op_.eval()) # 0.7
print('pre_op1_:', pre_op1_.eval())
# print(rec_op_.eval()) # 0.7
print('rec_op1_:', rec_op1_.eval())
print('f1_op1_:', f1_op1_.eval())
[3 2 0 1 1]
[3 1 0 0 0]
2.0
3.0
acc_op: 0.4
acc_op1: 0.4
pre_op: 0.4
pre_op1: 1.0
rec_op: 0.4
rec_op1: 0.5
0.7
pre_op1_: 0.33333334
0.7
rec_op1_: 0.5
f1_op1_: 0.375
average_precision_at_k示例
在以后的tf版本里,将tf.metrics.average_precision_at_k替代tf.metrics.sparse_average_precision_at_k。
y_true = tf.constant([[2], [1], [0], [3], [0]])
y_true = tf.cast(y_true, tf.int64)
y_pred = tf.constant([[0.1, 0.2, 0.6, 0.1],
[0.8, 0.05, 0.1, 0.05],
[0.3, 0.4, 0.1, 0.2],
[0.6, 0.25, 0.1, 0.05],
[0.1, 0.2, 0.6, 0.1]
])
_, m_ap = tf.metrics.average_precision_at_k(y_true, y_pred, 3)
stream_vars = [i for i in tf.local_variables()]
tmp_rank = tf.nn.top_k(y_pred, 3)
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
print("TF_MAP", sess.run(m_ap))
print("STREAM_VARS", sess.run(stream_vars))
print("TMP_RANK", sess.run(tmp_rank))
输出
TF_MAP 0.4333333333333333
STREAM_VARS [5.0, 2.1666666666666665]
TMP_RANK TopKV2(values=array([[0.6 , 0.2 , 0.1 ],
[0.8 , 0.1 , 0.05],
[0.4 , 0.3 , 0.2 ],
[0.6 , 0.25, 0.1 ],
[0.6 , 0.2 , 0.1 ]], dtype=float32),
indices=array(
[[2, 1, 0],
[0, 2, 1],
[1, 0, 3],
[0, 1, 2],
[2, 1, 0]], dtype=int32))
计算逻辑是:第一个2命中[2 1 0]的top1,则是1;第二个1命中[0 2 1]的top3,则是1/3;类似第三个1/2;第4个在top3中都没命中,为0;第5个1/3;平均一下。即(1+1/3+1/2+0+1/3)/5=13/30=0.433
[搜索排序评估方法]
precision_at_k示例
上面代码中average_precision_at_k改成precision_at_k,其计算逻辑则是:命中则是1/k,如第一个命中[2 1 0],则是1/k = 1/3;第二个1命中也是1/3;类似第三个1/3;第4个没命中,为0;第5个1/3;平均一下。即(1/3+1/3+1/3+0+1/3)/5=4/15=0.2667
输出TF_MAP 0.26666666666666666
其它方法及示例
计算softmax输出的准确度
import tensorflow as tf
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
def evaluation(sess, outputs, labels):
correct = tf.nn.in_top_k(outputs, labels, 1)
print(sess.run(correct))
return tf.reduce_sum(tf.cast(correct, tf.int32))
with tf.Graph().as_default():
sess = tf.Session()
sess.run(tf.global_variables_initializer())
a = evaluation(sess, [[0.8, 0.1, 0.1], [0.2, 0.6, 0.2], [0.7, 0.1, 0.2]], [0, 1, 2])
print(sess.run(a))
from: -柚子皮-
ref: