Pytorch中的损失函数

pytorch中的损失函数

0.前言

  深度学习中优化方法直接作用的对象是损失函数。损失函数表示了预测值与真实值之间的差距程度,一个最优化问题的目标是将损失函数最小化,针对分类问题,直观的表现就是.分类的正确样本越多越好;回归问题中,直观的表现就是预测值与实际值的误差越小越好。

  • 损失函数(Loss Function):
    L o s s = f ( y , , y ) Loss=f(y^,,y) Loss=f(y,,y)
  • 代价函数(Cost Fuction):
    C o s t = 1 N ∑ i = 0 N f ( y i , y i ) Cost=\frac{1}{N}\sum_{i=0}^{N}f(y_{i}^, y_{i}) Cost=N1i=0Nf(yi,yi)
      Pytorch中nn模块下提供了多种可以直接使用的损失函数,如交叉熵、均方误差等,针对不同的问题,可以直接调用现有的损失函数,常用的损失函数以及适合的问题如下表。
损失函数名称适应问题
torch.nn.L1Loss()平均绝对值损失回归
torch.nn.MSELoss()均方误差损失回归
torch.nn.CrossEntropyLoss()交叉熵损失多分类
torch.nn.CTCLoss()
torch.nn.NLLLoss()负数对数似然函数损失多分类
torch.nn.KLDivLoss()KL散度损失回归
torch.nn.BCELoss()二分类交叉熵损失二分类
torch.nn.MarginRankingLoss评价相似度损失
torch.nn.MultiLabelMarginLoss多标签分类损失多标签分类
torch.nn.SmoothL1Loss平滑L1损失回归
torch.nn.SoftMarginLoss多标签二分类损失多标签二分类

接下来对部分损失函数,以及pytorch框架下的api进行整理说明。

1.Loss Function

1.1 _Loss基类

  在pytorch中nn模块下定义的loss的源码类,分别定义LOSS的类以及的带有权重系数的类。

from .module import Module
from .. import functional as F
from .. import _reduction as _Reduction

from torch import Tensor
from typing import Optional


class _Loss(Module):
    reduction: str

    def __init__(self, size_average=None, reduce=None, reduction: str = 'mean') -> None:
        super(_Loss, self).__init__()
        if size_average is not None or reduce is not None:
            self.reduction = _Reduction.legacy_get_string(size_average, reduce)
        else:
            self.reduction = reduction


class _WeightedLoss(_Loss):
    def __init__(self, weight: Optional[Tensor] = None, size_average=None, reduce=None, reduction: str = 'mean') -> None:
        super(_WeightedLoss, self).__init__(size_average, reduce, reduction)
        self.register_buffer('weight', weight)

1.2 nn.CrossEntropyLoss

1.2.1 有关交叉熵、信息熵、相对熵的基本概念:

使用交叉熵是为衡量两个数据概率分布差异,所以交叉熵制越低两个值相差越相似。
交叉熵 = 信息熵 + 相对熵 \text{交叉熵 = 信息熵 + 相对熵} 交叉熵 = 信息熵 + 相对熵
1.交叉熵
H ( P , Q ) = − ∑ i = 1 N P ( x i ) l o g Q ( x i ) H(P,Q) = -\sum_{i=1}^NP(x_{i})logQ(x_{i}) H(P,Q)=i=1NP(xi)logQ(xi)
2.自信息,衡量单个事件的不确定性
l ( x ) = − l o g [ p ( x ) ] l(x) = -log[p(x)] l(x)=log[p(x)]
3.熵(信息熵),简答讲事件的越不确定性越大,熵的值越大,自信的期望
H ( P ) = E x   p [ I ( x ) ] = − ∑ i N P ( x i ) l o g P ( x i ) H(P) = E_{x~p}[I(x)] = -\sum_{i}^NP(x_{i})logP(x_{i}) H(P)=Ex p[I(x)]=iNP(xi)logP(xi)
4.相对熵(KL散度),衡量两个分布之间的差异,不具备对称性。
D K L ( P , Q ) = E x   p [ l o g P ( x ) Q ( x ) ] = E x − p [ l o g P ( x ) − l o g Q ( x ) ] = ∑ i = 1 N P ( x i ) [ l o g P ( x i ) − l o g Q ( x i ) ] = ∑ i = 1 N P ( x i ) l o g P ( x i ) − ∑ i = 1 N P ( x i ) l o g Q ( x i ) = H ( P , Q ) − H ( P ) D_{KL}(P,Q) = E_{x~p}[log\frac{P(x)}{Q(x)}]\\=E_{x-p}[logP(x)-logQ(x)]\\=\sum_{i=1}^NP(x_{i})[logP(x_{i})-logQ(x_{i})]\\=\sum_{i=1}^NP(x_{i})logP(x_{i})-\sum_{i=1}^NP(x_{i})logQ(x_{i})\\=H(P,Q)-H(P) DKL(P,Q)=Ex p[logQ(x)P(x)]=Exp[logP(x)logQ(x)]=i=1NP(xi)[logP(xi)logQ(xi)]=i=1NP(xi)logP(xi)i=1NP(xi)logQ(xi)=H(P,Q)H(P)

  结合上面的公式可以得出结论: 交叉熵: H ( P , Q ) = D K L ( P , Q ) + H ( P ) \text{交叉熵:}H(P, Q) = D_{KL}(P,Q)+H(P) 交叉熵:H(P,Q)=DKL(P,Q)+H(P),其中P代表实际样本的数据分布,Q代表预测结果的分布。

1.2.2 pytorch中的交叉熵

  功能:nn.LogSoftmax()与nn.NLLLoss()结合,进行交叉熵计算。本该损失函数与公式中的交叉熵损失存在区别,采用了nn.LogSoftmax对数据进行归一化处理,即[0,1]的区间。

  在官网的计算公式如下:

  1. 无权重
    l o s s ( x , c l a s s ) = − l o g ( e x p ( x [ c l a s s ] ) ∑ j e x p ( x [ j ] ) ) = − x [ c l a s s ] + l o g ( ∑ j e x p ( x [ j ] ) ) loss(x, class)=-log(\frac{exp(x[class])}{\sum_{j}exp(x[j])}) \\=-x[class] + log(\sum_{j}exp(x[j])) loss(x,class)=log(jexp(x[j])exp(x[class]))=x[class]+log(jexp(x[j]))
  2. 有权重
    l o s s ( x , c l a s s ) = w e i g h t [ c l a s s ] ( − x [ c l a s s ] + l o g ( ∑ j e x p ( x [ j ] ) ) ) loss(x, class) = weight[class](-x[class] + log(\sum_{j}exp(x[j]))) loss(x,class)=weight[class](x[class]+log(jexp(x[j])))

其中 x x x表示输出的概率值, c l a s s class class表示类别值;
  将pytorch中的定义与原始交叉熵公式 H ( P , Q ) = − ∑ i = 1 N P ( x i ) l o g Q ( x i ) H(P,Q) = -\sum_{i=1}^NP(x_{i})logQ(x_{i}) H(P,Q)=i=1NP(xi)logQ(xi)相对缺少了求和以及 P x i P{x_{i}} Pxi。因为pytorch中是对某一个元素求交叉熵,因此不需要求和项,而且已经确定的了是哪一个元素,因此 P x i = 1 P{x_{i}}=1 Pxi=1,综上pytorch中的交叉熵公式可以简单为 H ( P , Q ) = − l o g ( Q ( x i ) ) H(P,Q)=-log(Q(x_{i})) H(P,Q)=log(Q(xi))
主要参数:


torch.nn.CrossEntropyLoss(weight: Optional[torch.Tensor] = None,  # 各类别loss设置的权重
                        size_average=None,                          
                        ignore_index: int = -100,                   # 忽略某个类别
                        reduce=None, 
                        reduction: str = 'mean')                    # 计算模式 可以为none/sum/mean,none-逐个元素计算;sum-所有元素求和; mean-加权平均,返回标量。

  通过代码示例对此函数中的相关参数设置进行理解

import torch
import torch.nn as nn

import numpy as np
#------fake data

inputs =torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
targets = torch.tensor([0, 1, 1], dtype=torch.long)

# ------------
flag = 0
if flag:
    
    loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
    loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
    loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')

    # forward
    loss_none = loss_f_none(inputs, targets)
    loss_sum = loss_f_sum(inputs, targets)
    loss_mean = loss_f_mean(inputs, targets)

    # view
    print(f'Cross Entropy loss: \n{loss_none, loss_sum, loss_mean}')
>>>
Cross Entropy loss: 
(tensor([1.3133, 0.1269, 0.1269]), tensor(1.5671), tensor(0.5224))

  为了进一步的熟悉pytorch中CrossEntropyLoss计算过程,手动编写了一个计算过程,代码如下:

##--------------compute by hand
flag = 1
if flag:
    idx = 0
    #inputs =torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
    #targets = torch.tensor([0, 1, 1], dtype=torch.long)

    inputs_1 = inputs.detach().numpy()[idx]
    targets_1 = targets.numpy()[idx]

    # 第一项
    x_class = inputs_1[targets_1]
    
    # 第二项
    sigma_exp_x = np.sum(list(map(np.exp, inputs_1)))
    log_sigma_exp_x = np.log(sigma_exp_x)

    # 输出loss
    loss_1 = -x_class + log_sigma_exp_x
    print('第一个样本loss 为:',loss_1)
>>>
'''
计算的过程:取出输入的第一个元素[1, 2] loss = x[class] + log(exp(x[j])) 此处的log表示是数学中的ln
 log(exp(x[j])) = ln(e+e^2)
 x[class] = 1
 >>>loss = ln(e+e^2) -1 
'''
   第一个样本loss 为: 1.3132617 

  比较上面的那个代码块的运行结果可以发现,计算结果是一致的。

1.3 nn.NLLLoss

  功能:实现负对数似然函数的负号功能,计算公式
l ( x , y ) = L = ( l i , . . . . , l N ) T , l n = − w y n x n , y n l(x, y)=L=(l_{i},....,l_{N})^T,l_{n}=-w_{yn}x_{n,y_{n}} l(x,y)=L=(li,....,lN)T,ln=wynxn,yn
主要参数:


nn.NLLLoss(weight=None, # 各类别的loss设置的权值
    size_average=None, 
    ignore_index=-100,  # 忽略某个类别
    reduce=None,
    reduce='mean')   # 计算模式

  直接通过代码观察此损失函数


import torch
import torch.nn as nn

import numpy as np
#------fake data

inputs =torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
targets = torch.tensor([0, 1, 1], dtype=torch.long)

flag = 1
if flag:
    weights = torch.tensor([1, 1], dtype=torch.float)
    
    loss_f_none_w =nn.NLLLoss(weight=weights, reduction='none')
    loss_f_sum = nn.NLLLoss(weight=weights, reduction='sum')
    loss_f_mean = nn.NLLLoss(weight=weights, reduction='mean')

    # forward
    loss_none_w = loss_f_none_w(inputs, targets)
    loss_sum = loss_f_sum(inputs, targets)
    loss_mean = loss_f_mean(inputs, targets)

    # view
    print('\nweights:', weights)
    print('nll loss', loss_none_w, loss_sum, loss_mean)
>>>>
weights: tensor([1., 1.])
nll loss tensor([-1., -3., -3.]) tensor(-7.) tensor(-2.3333)

1.4 nn.BCELoss

  功能:二分类的交叉熵损失函数,注意事项,输入值得取值范围必须在[0, 1]
l n = − w n [ y n ∗ l o g x n + ( 1 − y n ) ∗ l o g ( 1 − x n ) ] l_{n}=-w_{n}[y_{n}*logx_{n} + (1-y_{n})*log(1-x_{n})] ln=wn[ynlogxn+(1yn)log(1xn)]

其中 x n x_{n} xn表示模型输出的概率取值, y n y_{n} yn表示标签值,因为是二分类任务,因此 y n y_{n} yn的取值只能是0或者1.

主要参数:

    nn.BCELoss(weight=None,  # 各类别权重
            size_average=None,
            reduce=None,
            reduction='mean' # 计算模式)

代码示例

flag =1
if flag:
    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
    
    target_bce = target

    # itarget
    inputs = torch.sigmoid(inputs)
    
    weights = torch.tensor([1, 1], dtype=torch.float)
    
    loss_f_none = nn.BCELoss(weights, reduction='none')
    loss_f_sum = nn.BCELoss(weights, reduction='sum')
    loss_f_mean = nn.BCELoss(weights, reduction='mean')

    # forward
    loss_none_w = loss_f_none(inputs, target_bce)
    loss_sum = loss_f_sum(inputs, target_bce)
    loss_mean = loss_f_mean(inputs, target_bce)

    print(f'\nweights: {weights}')
    print(f'BCELoss ', loss_none_w, loss_sum, loss_mean)
    >>>>
weights: tensor([1., 1.])
BCELoss  tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

1.5 nn.BCEWithLogitsLoss

  功能:结合sigmoid与二分类交叉熵,注意事项,网络最后不加sigmoid函数,公式如下:
l n = − w n [ y n ∗ l o g δ ( x n ) + ( 1 − y n ) ∗ l o g ( 1 − δ ( x n ) ) ] l_{n} = -w_{n}[y_{n}*log\delta(x_{n}) + (1-y_{n})*log(1-\delta(x_{n}))] ln=wn[ynlogδ(xn)+(1yn)log(1δ(xn))]

主要参数即示例代码

'''
nn.BCEWithLogitsLoss()
'''
flag =1
if flag:
    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
    
    target_bce = target
    weights = torch.tensor([1], dtype=torch.float)
    pos_w = torch.tensor([3],dtype=torch.float)
    
    loss_f_none = nn.BCEWithLogitsLoss(weights, reduction='none',pos_weight=pos_w)
    loss_f_sum = nn.BCEWithLogitsLoss(weights, reduction='sum', pos_weight=pos_w)
    loss_f_mean = nn.BCEWithLogitsLoss(weights, reduction='mean', pos_weight=pos_w)

    # forward
    loss_none_w = loss_f_none(inputs, target_bce)
    loss_sum = loss_f_sum(inputs, target_bce)
    loss_mean = loss_f_mean(inputs, target_bce)

    print(f'\npos_w: {pos_w}')
    print(f'BCEWithLogitsLoss ', loss_none_w, loss_sum, loss_mean)

>>>
pos_w: tensor([3.])
BCEWithLogitsLoss  tensor([[0.9398, 2.1269],
        [0.3808, 2.1269],
        [3.0486, 0.0544],
        [4.0181, 0.0201]]) tensor(12.7158) tensor(1.5895)
# 当pos_w = torch.tensor([1],dtype=torch.float),从输出结果中可以看出正样本的loss,乘以了3倍,模型更加关注正样本数据
>>>>pos_w: tensor([1.])
BCEWithLogitsLoss  tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

1.6 nn.L1Loss(数据回归)

  功能:计算inputs与target之差的绝对值,公式如下:
l n = ∣ x n − y n ∣ l_{n}=|x_{n}-y_{n}| ln=xnyn
主要参数以及代码示例

'''
nn.L1Loss(reduce='none')
'''
flag =1
if flag:
    inputs = torch.ones((2, 2))
    target = torch.ones((2, 2)) * 3
    
    loss_f = nn.L1Loss(reduce='none')
    loss = loss_f(inputs, target)

    print(f'input:{inputs}\ntarget:{target}\nL1Loss:{loss}')
#>>>从下面的结果,可以验证与公式的计算结果是一致的

input:tensor([[1., 1.],
        [1., 1.]])
target:tensor([[3., 3.],
        [3., 3.]])
L1Loss:tensor([[2., 2.],
        [2., 2.]])

1.7 nn.MSELoss(数据回归)

  功能:计算inputs与target之差的平方,公式如下
l n = ( x n − y n ) 2 l_{n}=(x_{n}-y_{n})^2 ln=(xnyn)2
主要参数以及代码示例:

flag =1
if flag:
    inputs = torch.ones((2, 2))
    target = torch.ones((2, 2)) * 3
    
    loss_f = nn.MSELoss(reduction='none')
    loss = loss_f(inputs, target)

    print(f'input:{inputs}\ntarget:{target}\nMSELoss:{loss}')
>>>>
input:tensor([[1., 1.],
        [1., 1.]])
target:tensor([[3., 3.],
        [3., 3.]])
MSELoss:tensor([[4., 4.],
        [4., 4.]])
#>>>如果 nn.MSELoss(reduction='sum')
MSELoss:16.0

1.8 nn.SmoothL1Loss(数据回归)

  功能:平滑的L1Loss,先来看一下SmoothL1Loss的计算公式:
l o s s ( x , y ) = 1 n ∑ i z i loss(x, y)=\frac{1}{n}\sum_{i}z_{i} loss(x,y)=n1izi
z i = { 0.5 ( x i − y i ) 2 ,  if ∣ x i − y i ∣ < 1 ∣ x i − y i ∣ − 0.5 , otherwise z_{i}=\begin{cases} 0.5(x_{i}-y_{i})^2, \ \text{if}|x_{i}-y_{i}|<1 \\ |x_{i}-y_{i}|-0.5, \text{otherwise} \end{cases} zi={0.5(xiyi)2, ifxiyi<1xiyi0.5,otherwise
SmoothL1Loss如图1所示:
        [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vowr00oD-1603382011459)(./out_imgs/loss/l1_smooth_l1.png)]
主要参数以及代码示例:

flag = 1

if flag:
    inputs = torch.linspace(-3, 3, steps=500)
    target = torch.zeros_like(inputs)

    loss_f = nn.SmoothL1Loss(reduction='none')
    loss_smooth = loss_f(inputs, target)
    loss_l1 = np.abs(inputs.numpy())
    plt.plot(inputs.numpy(), loss_smooth.numpy(), label='smooth_l1_loss')
    plt.plot(inputs.numpy(), loss_l1, label='l1 loss')
    plt.xlabel('x_i - y_i')
    plt.ylabel('loss')
    plt.legend()
    plt.grid()
    plt.savefig('../out_imgs/loss/l1_smooth_l1.png') ##保存的即为上图

1.9 nn.PoissonNLLLoss

  功能:泊松分布的负数对数似然损失函数,计算公式如下:

log_input = True l o s s ( i n p u t , t a r g e t ) = e x p ( i n p u t ) − t a r g e t ∗ i n p u t \text{log\_input = True} \\loss(input, target)=exp(input) - target * input log_input = Trueloss(input,target)=exp(input)targetinput

log_input = False l o s s ( i n p u t , t a r g e t ) = i n p u t − t a r g e t ∗ l o g ( i n p u t + e p s ) \text{log\_input = False} \\loss(input, target)= input- target * log(input+eps) log_input = Falseloss(input,target)=inputtargetlog(input+eps)

相关参数以及代码实例如下:

'''---------------------------PoissonNLLLoss
nn.PoissonNLLLoss(log_input=True,   # 输入是否为对数形式,决定计算公式
                full=Flase,         # 计算所有loss,默认False
                reduction='mean',
                eps=1e-8            # 修正项,避免log(输入)为nan 
                )
'''
flag = 1
if flag:
    inputs = torch.randn((2, 2))
    target = torch.randn((2, 2))
    # 有关reduction的其它计算模式在接下来的损失示例中不在一一描述
    loss_f = nn.PoissonNLLLoss(log_input=True, full=False, reduction='none')
    loss = loss_f(inputs, target)
    print('inputs :{}\ntarget is{}\nPoissonNLLLoss :{}'.format(inputs, target, loss))

#---------------compute by hand 
flag = 1
if flag:
    idx = 0
    # 当full=False时,采用的计算公式
    loss_1 = torch.exp(inputs[idx, idx]) - target[idx, idx]* inputs[idx, idx]
    print('第一个元素的loss', loss_1)
#>>>> 从输出结果可以看出,手动计算的结果与pytorch api 调用输出的结果是一致的
inputs :tensor([[ 0.0553,  0.2444],
        [-0.5864,  0.1678]])
target istensor([[-1.1071, -0.4799],
        [ 1.1683, -1.4043]])
PoissonNLLLoss :tensor([[1.1180, 1.3942],
        [1.2415, 1.4185]])
第一个元素的loss tensor(1.1180)

1.10 nn.KLDivLoss

  功能:计算KLD(divergence),前文介绍交叉熵也曾提到过,KLD即相对熵(计算两个分布的距离)。注意事项,需要提前将输入计算log-probabilities,如通过计算nn.logsoftmax,计算公式下:
D K L ( P ∣ ∣ Q ) = E x − p [ l o g P ( x ) Q ( x ) ] = E x − p [ l o g P ( x ) − l o g Q ( x ) ] = ∑ i = 1 N P ( x i ) ( l o g P ( x i ) − l o g ( Q ( x i ) ) ) D_{KL}(P||Q) = E_{x-p}[log\frac{P(x)}{Q(x)}]\\=E_{x-p}[logP(x)-logQ(x)]\\=\sum_{i=1}^NP(x_{i})(logP(x_{i})-log(Q(x_i))) DKL(PQ)=Exp[logQ(x)P(x)]=Exp[logP(x)logQ(x)]=i=1NP(xi)(logP(xi)log(Q(xi)))
其中P表示真实数据分布,Q表示拟合出来的数据分布。但是实际pytorch中使用了如下的计算公式:
l n = y n ∗ ( l o g y n − x n ) l_{n} = y_{n}*(logy_{n}-x_{n}) ln=yn(logynxn)
其中 y n y_{n} yn表示标签, x n x_{n} xn模型的输出值。
  比较上面的两个公式,一一对应来看,括号中减去的输入数据(模型的预测值)并没有像上式那样进行取对数,但是从实际理论出现KL散度是比较两个数据分布的关系,所以依据注意事项中的内容需要对输入的数据计算log-probabilities。
  相关参数以及代码示例

flag =1 
if flag:
    # input tensor_size:(2, 3),为了方便理解可以想像成全连接的最终输出是3个神经元,2个batch的数据
    inputs = torch.tensor([[0.5, 0.3, 0.2], [0.2, 0.2, 0.5]])  
    inputs_log = torch.log(inputs)
    target = torch.tensor([[0.9, 0.05, 0.05], [0.1, 0.7, 0.2]], dtype=torch.float)

    loss_f_none = nn.KLDivLoss(reduction='none')
    loss_f_mean = nn.KLDivLoss(reduction='mean')
    # 根据inputs的维度的batcsize的大小为2
    loss_f_batch_mean = nn.KLDivLoss(reduction='batchmean')

    loss_none = loss_f_none(inputs, target)
    loss_mean = loss_f_mean(inputs, target)
    loss_bs_mean = loss_f_batch_mean(inputs, target)

    print('loss_none:{}\nloss_mean:{}\nloss_bs_mean:{}'.format(loss_none, loss_mean, loss_bs_mean))
#-----------------compute by hand
flag = 1
if flag:
    idx = 0
    
    # 理论上需要对后一项括号中的inputs[idx, idx]取对数,但是此处输入值直接采用了[0,1]之间的数模拟概率值,同时也是直接模拟pytorch中所采用的计算公式。
    loss_1 = target[idx, idx]*(torch.log(target[idx, idx])-inputs[idx, idx])
    print('loss_1', loss_1)
# >>> 可以看出手动计算的第一个元素的loss与api的结果一致
loss_none:tensor([[-0.5448, -0.1648, -0.1598],
        [-0.2503, -0.3897, -0.4219]])
loss_mean:-0.3218694031238556
loss_bs_mean:-0.9656082391738892
loss_1 tensor(-0.5448)

1.11 nn.MarginRankingLoss

  功能:计算两个向量之间的相似度,用于排序任务。计算公式如下:
l o s s ( x , y ) = m a x ( 0 , − y ∗ ( x 1 − x 2 ) + m a r g i n ) loss(x, y) = max(0, -y * (x_{1}-x_{2}) + margin) loss(x,y)=max(0,y(x1x2)+margin)
y y y表示取值标签,只能是1或者-1, x 1 x_{1} x1 x 2 x_{2} x2表示向量的每个元素,因此可以得到以下的结论:

  • y = 1时,希望 x 1 > x 2 x_{1}>x_{2} x1>x2, 当 x 1 > x 2 x_{1}>x_{2} x1>x2时,不会产生loss
  • y = -1时,希望 x 2 > x 1 x_{2}>x_{1} x2>x1, 当 x 2 > x 1 x_{2}>x_{1} x2>x1时,不会产生loss

特别说明,该方法计算两组数据之间的差异,返回一个n*n的loss矩阵。
主要参数以及代码示例:

flag = 1
if flag:
    x1 = torch.tensor([[1], [2], [3]],dtype=torch.float)
    x2 = torch.tensor([[2], [2], [2]], dtype=torch.float)

    target = torch.tensor([1, 1, -1], dtype=torch.float)

    loss_f_none = nn.MarginRankingLoss(margin=0, reduction='none')
    
    loss =loss_f_none(x1, x2, target)
    print('MarginRankingLoss', loss)
#>>>
MarginRankingLoss tensor([[1., 1., 0.],
        [0., 0., 0.],
        [0., 0., 1.]])
'''
1.对计算结果进行一个简单说明,输入的是2*3的一个矩阵,利用x1矩阵中每一个元素与x2中的每个元素进行比较,每个结果就是一个输出的loss,因此最终会生成一个3*3的输出loss.
2.以x1中的第一个元素为例,1将于x2中的每个元素进行比较,因为target[0]=1,根据上述公式当x1>x2是loss为0,否则为x2-x1+margin(0)。逐个元素去比较,1<2,loss[0][0] = 2-1
'''

1.12 nn.MultiLabelMarginLoss(多标签分类)

  功能:多标签边界损失函数,对于多标签即一张图片对应多个类别。
如:四分类任务,样本x属于0类和3类,标签[0, 3, -1, -1],不是[1, 0, 0,1]
计算公式如下:
l o s s ( x , y ) = ∑ i j m a x ( 0 , 1 − ( x [ y [ j ] ] − x [ i ] ) ) x . s i z e ( 0 ) loss(x, y)=\sum_{ij}\frac{max(0, 1-(x[y[j]]-x[i]))}{x.size(0)} loss(x,y)=ijx.size(0)max(0,1(x[y[j]]x[i]))

where i== 0 to x.size(0), j==0 to y.size(0),y[j]>=0, and i不等于y[j] for all i and j \text{where i== 0 to x.size(0), j==0 to y.size(0),y[j]>=0, and i不等于y[j] for all i and j} where i== 0 to x.size(0), j==0 to y.size(0),y[j]>=0, and i不等于y[j] for all i and j
对于公式中分子括号中的简单理解为使用标签神经元减去非标签神经元,为什么需要这样设计,对于多标签分类,希望是标签的输出大于非标签预测输出,因此使用 m a x ( 0 , 1 − ( x [ y [ j ] ] ) − x [ i ] ) max(0, 1-(x[y[j]])-x[i]) max(0,1(x[y[j]])x[i])
主要参数以及代码示例:

flag = 1
if flag:
    x = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
    y = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)

    loss_f = nn.MultiLabelMarginLoss(reduction='none')
    loss = loss_f(x, y)
    print('MultiLabelMarginLoss', loss)
# ------------compute by hand
flag = 1
if flag:
    x = x[0]

    item_1 = (1-(x[0]-x[1])) + (1 - (x[0]-x[2]))
    item_2 = (1-(x[3]-x[1])) + (1-(x[3]-x[2]))
    loss_h = (item_1 + item_2) / x.shape[0]
    print('compute by hand ', loss_h)
# >>>
MultiLabelMarginLoss tensor([0.8500])
compute by hand  tensor(0.8500)

1.13 nn.SoftMarginLoss(二分类)

  功能:计算二分类的logistic损失,计算公式如下:
l o s s ( x , y ) = ∑ i l o g ( 1 + e x p ( − y [ i ] ∗ x [ i ] ) ) x . n e l e m e n t loss(x, y)=\sum_{i}\frac{log(1+exp(-y[i] * x[i]))}{x.nelement} loss(x,y)=ix.nelementlog(1+exp(y[i]x[i]))
主要参数以及代码示例:

flag = 1
if flag:
    
    inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]])
    target = torch.tensor([[-1, 1], [1, -1]], dtype=torch.float)

    loss_f = nn.SoftMarginLoss(reduction='none')
    loss = loss_f(inputs, target)

    print('SoftMarginLoss', loss)

#-----------compute by hand
flag = 1
if flag:
    idx = 0

    inputs_i = inputs[idx, idx]
    target_i = target[idx, idx]

    loss_h = np.log(1+ np.exp(-target_i * inputs_i))
    
    print('compute by hand', loss_h)
# >>>
SoftMarginLoss tensor([[0.8544, 0.4032],
        [0.4741, 0.9741]])
compute by hand tensor(0.8544)

1.14 MultiLabelSoftMarginLoss

  功能:SoftMarginLoss多标签版本,计算公式如下:
l o s s ( x , y ) = − 1 C ∗ ∑ i y [ i ] ∗ l o g ( ( 1 + e x p ( − x [ i ] ) ) − 1 ) + ( 1 − y [ i ] ) ∗ l o g ( e x p ( − x [ i ] ) 1 + e x p ( − x [ i ] ) ) loss(x, y)=-\frac{1}{C} * \sum_{i}y[i]*log((1+exp(-x[i]))^{-1})+(1-y[i])*log(\frac{exp(-x[i])}{1+exp(-x[i])}) loss(x,y)=C1iy[i]log((1+exp(x[i]))1)+(1y[i])log(1+exp(x[i])exp(x[i]))
C表示标签的数量 , y [ i ] 为 标 签 , x [ i ] 表 示 模 型 的 输 出 值 。 以 四 分 类 为 例 , 此 处 的 y [ i ] 必 须 是 一 个 [ 1 , 0 , 0 , 1 ] 形 式 , 根 据 公 式 可 以 看 出 当 y [ i ] 是 标 签 时 , 采 用 公 式 前 面 一 项 计 算 , 否 则 采 用 后 面 的 公 式 计 算 \text{C表示标签的数量},y[i]为标签,x[i]表示模型的输出值。以四分类为例,此处的y[i]必须是一个[1,0,0, 1]形式,根据公式可以看出当y[i]是标签时,采用公式前面一项计算,否则采用后面的公式计算 C表示标签的数量y[i]x[i]y[i][1,0,0,1]y[i]
  主要参数以及代码示例:

flag = 1
if flag:
    # 三分类任务
    inputs = torch.tensor([[0.3, 0.7, 0.8]])
    target = torch.tensor([[0, 1, 1]], dtype=torch.float)

    loss_f = nn.MultiLabelSoftMarginLoss(reduction='none')
    loss = loss_f(inputs, target)
    print('MultiLabelSoftMarginLoss', loss)
# --------------compute by hand
flag = 1
if flag:
    # MultiLabelSoftMarginLoss需要对每个神经元进行计算

    # 非标签计算,计算公式后一项
    i_0 = torch.log(torch.exp(-inputs[0, 0])/ (1+torch.exp(-inputs[0, 0])))

    # 标签计算,采用公式第一项计算
    i_1 = torch.log(1 / (1+ torch.exp(-inputs[0, 1])))
    i_2 = torch.log(1 / (1+ torch.exp(-inputs[0, 2])))

    loss_h = (i_0 + i_1 + i_2) / -3
    print('compute by hand', loss_h)
>>>>
MultiLabelSoftMarginLoss tensor([0.5429])
compute by hand tensor(0.5429)

1.15 nn.MultiMarginLoss(多分类)

  功能:计算多分类的折页损失,计算公式如下:
l o s s ( x , y ) = ∑ i m a x ( 0 , m a r g i n − x [ y ] + x [ i ] ) p x . s i z e ( 0 ) loss(x, y) = \frac{\sum_{i}max(0, margin-x[y]+x[i])^p}{x.size(0)} loss(x,y)=x.size(0)imax(0,marginx[y]+x[i])p

where x ∈ 0 , . . . , x . s i z e ( 0 ) − 1 , y ∈ 0 , . . . , y . s i z e ( 0 ) − 1 , 0 ≤ y [ j ] ≤ x . s i z e ( 0 ) − 1 , x \in {0, ..., x.size(0)-1}, y \in {0,...,y.size(0)-1}, 0 \leq y[j] \leq x.size(0)-1, x0,...,x.size(0)1,y0,...,y.size(0)1,0y[j]x.size(0)1, and i ≠ y [ j ] i \neq y[j] i=y[j] for all i and j
其中 x [ y ] x[y] x[y]表示了标签所在的神经元, x [ i ] x[i] x[i]非标签所在神经元,
  主要参数以及代码示例:

# nn.MultiMarginLoss(p=1,    # 可选1或2
#                 margin=1.0,  
#                 weight=None,  # 各类别的loss设置权限
#                 reduction='none'  # 计算模式,可选none/sum/mean)

flag = 1
if flag:
    x = torch.tensor([[0.1, 0.2, 0.7], [0.2, 0.5, 0.3]])
    y = torch.tensor([1, 2], dtype=torch.long)
    
    loss_f = nn.MultiMarginLoss(reduction='none')
    
    loss = loss_f(x, y)
    print('MultiMarginLoss', loss)

#--------compute by hand
flag = 1
if flag:
    # 以输入的第一个数据为例,in:[0.1, 0.2, 0.7],相当于三分类最后的预测得分,对应的标签为1,即0.2为此类。
    # 根据公式,分别使用0.2(标签值)与0.1、0.7(非标签值)做差,再相加后除以数据总数
    x = x[0]

    margin = 1

    i_0 = margin - (x[1] -x[0])

    i_2 = margin - (x[1] - x[2])
    
    loss_h = (i_0 + i_2) / x.shape[0]
    print('compute by hand',loss_h)
>>>>
MultiMarginLoss tensor([0.8000, 0.7000])
compute by hand tensor(0.8000)

1.16 TripletMarginLoss(三元组损失)

  功能:计算三元组损失 ,人脸验证中常用。计算公式如下:
L ( a , p , n ) = m a x ( d ( a i , p i ) − d ( a i , n i ) + m a r g i n , 0 ) d ( x i , y i ) = ∣ ∣ x i − y i ∣ ∣ p L(a,p,n)=max({d(a_{i}, p_{i}) - d(a_{i}, n_{i}) + margin, 0}) \\d(x_{i}, y_{i}) = ||x_{i}-y_{i}||_{p} L(a,p,n)=max(d(ai,pi)d(ai,ni)+margin,0)d(xi,yi)=xiyip

主要参数以及代码示例:

# --------------
# nn.TripletMarginLoss(margin=1.0, # 边界值
#                     p =2.0,   # 范数的阶,默认为2
#                     eps=1e-6,
#                     swap=False,
#                     reduction='none'  # 计算模式 none/sum/mean)
flag = 1
if flag:
    anchor =torch.tensor([[1.]])
    pos = torch.tensor([[2.]])
    neg = torch.tensor([[0.5]])

    loss_f = nn.TripletMarginLoss(margin=1.0, p=1)
    loss = loss_f(anchor, pos, neg)

    print('TripletMarginLoss:', loss)
>>>>
TripletMarginLoss: tensor(1.5000)

1.17 TripletMarginLoss(非线性embedding和半监督学习)

  功能:计算两个输入的相似性,特别注意:输入x应为两个输入之差的绝对值.计算公式如下:
l n = { x n , i f y n = 1 , m a x 0 , Δ − x n , i f y n = − 1 l_{n} = \begin{cases} x_{n}, if y_{n} = 1,\\max{0, \Delta-x_{n}}, if y_{n} = -1 \end{cases} ln={xn,ifyn=1,max0,Δxn,ifyn=1

主要参数以及代码示例:


# nn.HingeEmbeddingLoss(margin=1.0,  # 边界值
#                 reduction='none'  # 计算模式 可为none/sum/mean/
#                 )

flag = 1
if flag:
    inputs = torch.tensor([[1., 0.8, 0.5]])
    target = torch.tensor([[1, 1, -1]])
    
    loss_f = nn.HingeEmbeddingLoss(margin=1.0, reduction='none')

    loss = loss_f(inputs, target)
    print('HingeEmbeddingLoss:', loss)
# >>> 当标签值为1时,直接输出x,当标签为-1时,使用margin-x与0做一个max
HingeEmbeddingLoss: tensor([[1.0000, 0.8000, 0.5000]])

1.18 CosineEmbeddingLoss(embedding和半监督学习)

  功能:采用余弦相似性计算两个输入的相似性,使用余弦主要考虑两个特征在方向上的差异,计算公式如下:
l o s s ( x , y ) = { 1 − c o s ( x 1 , x 2 ) , i f y = 1 m a x ( 0 , c o s ( x 1 , x 2 ) − m a r g i n ) , i f y = − 1 loss(x, y) = \begin{cases} 1-cos(x_{1}, x_{2}), \qquad if \quad y =1\\max(0, cos(x_{1}, x_{2})-margin), \qquad if \quad y =-1 \end{cases} loss(x,y)={1cos(x1,x2),ify=1max(0,cos(x1,x2)margin),ify=1

c o s ( θ ) = A ∗ B ∣ ∣ A ∣ ∣ ∣ ∣ B ∣ ∣ = ∑ i = 1 n A i × B i ∑ i = 1 n ( A i ) 2 × ∑ i = 1 n ( B i ) 2 cos(\theta)=\frac{A*B}{||A||||B||}=\frac{\sum_{i=1}^nA_{i}\times B_{i}}{\sqrt{\sum_{i=1}^n(A_{i})^2}\times\sqrt{\sum_{i=1}^n(B_{i})^2}} cos(θ)=ABAB=i=1n(Ai)2 ×i=1n(Bi)2 i=1nAi×Bi
主要参数以及代码示例:

flag = 1
if flag:

    x1 = torch.tensor([[0.3, 0.5, 0.7], [0.3, 0.5, 0.7]])
    x2 = torch.tensor([[0.1, 0.3, 0.5], [0.1, 0.3, 0.5]])

    target = torch.tensor([[1, -1]], dtype=torch.float)

    loss_f = nn.CosineEmbeddingLoss(margin=0., reduction='none')
    loss = loss_f(x1, x2,target)
    print('CosineEmbeddingLoss:', loss)

# --------------------compute by hand
flag = 1
if flag:
    
    margin = 0.

    def cosine(a, b):
        numerator = torch.dot(a, b)
        denpminator = torch.norm(a, 2)* torch.norm(b,2)
        return float(numerator / denpminator)

    l_1 = 1-(cosine(x1[0], x2[0]))
    l_2 = max(0, cosine(x1[0], x2[0]))

    print(l_1, l_2)
>>>>
CosineEmbeddingLoss: tensor([[0.0167, 0.9833]])
0.016662120819091797 0.9833378791809082

1.19 nn.CTCLoss

  功能:计算CTC(Connectionist Temproal Classification)损失,解决时序类数据的分类.
主要参数以及代码示例:

    flag = 1
    if flag:
        T = 50   # input sequence length
        C = 20   # number of classes (including blank)
        N =16    # batch size 
        S = 30   # target sequence length of longest target in batch
        S_min =10  # minimum target length, for demonstration purposes

        # initialize random batch of input vector for *size = (T, N,C)  
        inputs = torch.randn(T,N, C).log_softmax(2).detach().requires_grad_()

        # initialize random batch of target (0 = blank, 1:c = classes)
        target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long)
        
        input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)

        target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long)

        ctc_loss = nn.CTCLoss()
        loss = ctc_loss(inputs, target, input_lengths, target_lengths)
        print('ctc loss:',loss)
>>>
ctc loss: tensor(6.6770, grad_fn=<MeanBackward0>)
©️2020 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页