tensorflow交叉熵损失函数-cross_entropy_with_logits 在库函数中比较

交叉熵损失函数设计:

  • softmax_cross_entropy_with_logits_v2
  • sparse_softmax_cross_entropy_with_logits
  • softmax_cross_entropy_with_logits_v2
  • sparse_softmax_cross_entropy_with_logits
  • sigmoid_cross_entropy_with_logits[sigmoid_cross_entropy_with_logits_v2]

首先,理论上:
二分类:
在这里插入图片描述
直接用 sigmoid_cross_entropy_with_logits[或者sigmoid_cross_entropy_with_logits_v2] 就可以一个label 一个raw logist
多分类:
在这里插入图片描述
-Sum(Yt*LogYp)
其实,这个多分类功能能处理的情况包含两种情况:

  Measures the probability error in discrete classification tasks in which each
  class is independent and not mutually exclusive.  For instance, one could
  perform multilabel classification where a picture can contain both an elephant
  and a dog at the same time.
  Measures the probability error in discrete classification tasks in which the
  classes are mutually exclusive (each entry is in exactly one class).  For
  example, each CIFAR-10 image is labeled with one and only one label: an image
  can be a dog or a truck, but not both.

  **NOTE:**  While the classes are mutually exclusive, their probabilities
  need not be.  All that is required is that each row of `labels` is
  a valid probability distribution.  If they are not, the computation of the
  gradient will be incorrect.
  For brevity, let `x = logits`, `z = labels`.  The logistic loss is

        z * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x))
      = z * -log(1 / (1 + exp(-x))) + (1 - z) * -log(exp(-x) / (1 + exp(-x)))
      = z * log(1 + exp(-x)) + (1 - z) * (-log(exp(-x)) + log(1 + exp(-x)))
      = z * log(1 + exp(-x)) + (1 - z) * (x + log(1 + exp(-x))
      = (1 - z) * x + log(1 + exp(-x))
      = x - x * z + log(1 + exp(-x))

  For x < 0, to avoid overflow in exp(-x), we reformulate the above

        x - x * z + log(1 + exp(-x))
      = log(exp(x)) - x * z + log(1 + exp(-x))
      = - x * z + log(1 + exp(x))

  Hence, to ensure stability and avoid overflow, the implementation uses this
  equivalent formulation

      max(x, 0) - x * z + log(1 + exp(-abs(x)))

  `logits` and `labels` must have the same type and shape.

维度信息变化:
二分类按照实际业务中的情况:
lable:(batch_size, )
predict:(batch_size, )
=> 损失:(batch_size, ) => tf.reduce_mean(label, predict) 后维度为() 一个数

多分类 首先计算的单个logits(原始值)和lable之间的probability error
计算一条数据,需要进行一次tf.reduce_mean(res,axis=1) 
即 (batch_size, sample_len, class_len) => (batch_size, sample_len, class_len) => (batch_size, sample_len)
计算总batch的loss 需要2次 tf.reduce_mean(res,axis=1)  
即  (batch_size, sample_len) => (batch_size,)  => 维度 () 一个数字

综合举例

import tensorflow as tf

class SigmoidBinaryCrossEntropyLoss(tf.keras.losses.Loss):
    def __init__(self): # none mean sum
        super(SigmoidBinaryCrossEntropyLoss, self).__init__()
    def __call__(self, inputs, targets, mask=None):
        #tensorflow中使用tf.nn.weighted_cross_entropy_with_logits设置mask并没有起到作用
        #直接与mask按元素相乘回实现当mask为0时不计损失的效果
        inputs=tf.cast(inputs,dtype=tf.float32)
        targets=tf.cast(targets,dtype=tf.float32)
        mask=tf.cast(mask,dtype=tf.float32)
        res=tf.nn.sigmoid_cross_entropy_with_logits(inputs, targets)*mask
        return tf.reduce_mean(res,axis=1)

if __name__ == '__main__':
    ## 二分类
    # ctr_label
    ctr_label = labels[ctr_key]  # (None, )
    # ctcvr_label
    ctcvr_label = labels[ctcvr_key]  # (None, )
    # esmm loss
    epsilon = 1e-8
    ctr_loss = tf.reduce_mean(
        tf.nn.sigmoid_cross_entropy_with_logits(logits=ctr_logits, labels=ctr_label))
    ctcvr_loss = tf.reduce_mean(
        - ctcvr_label * tf.math.log(ctcvr_prob + epsilon)
        - (1 - ctcvr_label) * tf.math.log(1 - ctcvr_prob + epsilon))
    
    ## 多分类
    loss = SigmoidBinaryCrossEntropyLoss()

    pred = tf.convert_to_tensor([[1.5, 0.3, -1, 2],
                                 [1.1, -0.6, 2.2, 0.4]],dtype=tf.float32)

    # 标签变量label中的1和0分别代表背景词和噪声词
    label = tf.convert_to_tensor([[1, 0, 0, 0],
                                  [1, 1, 0, 0]],dtype=tf.float32)

    mask = tf.convert_to_tensor([[1, 1, 1, 1],
                                 [1, 1, 1, 0]],dtype=tf.float32)  # 掩码变量

    # print("tf.reduce_sum(mask,axis=1):", tf.reduce_sum(mask,axis=1))

    loss(label, pred, mask) * mask.shape[1] / tf.reduce_sum(mask,axis=1)

其中多分类 往往设计logits->predict,需要这个函数支持
tf.nn.softmax(
logits,
axis=None,
name=None,
dim=None
)
作用:softmax函数的作用就是归一化。
输入: 全连接层(往往是模型的最后一层)的值,一般代码中叫做logits
输出: 归一化的值,含义是属于该位置的概率,一般代码叫做probs。例如输入[0.4,0.1,0.2,0.3],那么这个样本最可能属于第0个位置,也就是第0类。这是由于logits的维度大小就设定的是任务的类别,所以第0个位置就代表第0类。softmax函数的输出不改变维度的大小。
用途:如果做单分类问题,那么输出的值就取top1(最大,argmax);如果做多(N)分类问题,那么输出的值就取topN

tf.nn.softmax_cross_entropy_with_logits_v2的区别
tf.nn.softmax
tf.nn.softmax

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值