Pytorch(5):损失函数

Loss Functions in Pytorch
nn.L1LossCreates a criterion that measures the mean absolute error (MAE) between each element in the input x x x and target y y y.
nn.MSELossCreates a criterion that measures the mean squared error (squared L2 norm) between each element in the input x x x and target y y y .
nn.CrossEntropyLossThis criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.
nn.CTCLossThe Connectionist Temporal Classification loss.
nn.NLLLossThe negative log likelihood loss.
nn.PoissonNLLLossNegative log likelihood loss with Poisson distribution of target.
nn.KLDivLossThe Kullback-Leibler divergence_ Loss
nn.BCELossCreates a criterion that measures the Binary Cross Entropy between the target and the output:
nn.BCEWithLogitsLossThis loss combines a Sigmoid layer and the BCELoss in one single class.
nn.MarginRankingLossCreates a criterion that measures the loss given inputs x 1 x_1 x1 , x 2 x_2 x2 , two 1D mini-batch Tensors, and a label 1D mini-batch tensor y y y (containing 1 or -1).
nn.HingeEmbeddingLossMeasures the loss given an input tensor x x x and a labels tensor y y y (containing 1 or -1).
nn.MultiLabelMarginLossCreates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input x x x (a 2D mini-batch Tensor) and output y y y (which is a 2D Tensor of target class indices).
nn.SmoothL1LossCreates a criterion that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise.
nn.SoftMarginLossCreates a criterion that optimizes a two-class classification logistic loss between input tensor x x x and target tensor y y y (containing 1 or -1).
nn.MultiLabelSoftMarginLossCreates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input x x x and target y y y of size ( N , C ) (N, C) (N,C).
nn.CosineEmbeddingLossCreates a criterion that measures the loss given input tensors x 1 x_1 x1 , x 2 x_2 x2 and a Tensor label y y y with values 1 or -1.
nn.MultiMarginLossCreates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input x x x (a 2D mini-batch Tensor) and output y y y (which is a 1D tensor of target class indices, 0 ≤ y ≤ x.size ( 1 ) − 1 0 \leq y \leq \text{x.size}(1)-1 0yx.size(1)1 ):
nn.TripletMarginLossCreates a criterion that measures the triplet loss given an input tensors x 1 x_1 x1 , x 2 x_2 x2 , x 3 x_3 x3 and a margin with a value greater than 0 .

基本用法:

# 构造函数有自己的参数
criterion = LossCriterion() 

# 调用标准时也有参数
loss = criterion(x, y) 

解释:

第一行代码:在 Pytorch 中,所有损失函数都定义为一个 class,因此,使用损失函数的第一步是实例化。

第二行代码:

在 Pytorch 中,所有损失函数都继承于父类 _Loss,而 _Loss 又同样继承于 Module,前面介绍过 Mudule 是 callable,因此,损失函数的实例也是 callable,此时可传入必须的参数,如预测结果 x x x 和真实值 y y y

一些说明

1、

在早期版本的 Pytorch 中,使用 bool 型参数 size_average 决定是否对计算出来的总损失求平均(即除以样本数 n),现已弃用,改为使用 reduction

  • 默认 reduction='mean',即求平均。
  • 可选 reduction='sum'
  • None

下面举一个简单的例子,

inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')

# forward
loss_none = loss_f_none(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)

# view
print("Cross Entropy Loss:\n ", loss_none, loss_sum, loss_mean)

输出

Cross Entropy Loss:
  tensor([1.3133, 0.1269, 0.1269]) tensor(1.5671) tensor(0.5224)

2、

同样的,在早期版本的 Pytorch 中,使用 bool 型参数 reduce 决定是否对 mini-batch 的总损失求平均,现已弃用,默认对 mini-batch 取了平均。

3、

在下面的损失函数的介绍中,将直接省略 size_averagereduce

4、

下述的 1-D 是一维的意思。

5、

下述的 Parameters 指的是实例化时的参数。

下述的 shape 包括了调用时的参数,及 output 值。

L1Loss
torch.nn.L1Loss(size_average=None, reduce=None, reduction: str = 'mean')

创建一个衡量输入预测结果 x 和目标 y 之间差的绝对值的平均值的标准。
l o s s ( x , y ) = 1 / n ∑ ∣ x i − y i ∣ loss (x,y)=1/n\sum|x_i-y_i| loss(x,y)=1/nxiyi

实例化时的参数:

  • size_average (bool, optional) – Deprecated (see reduction). 若 size_average=False,那么求出来的绝对值的和将不会除以 n.
  • reduce (bool, optional) – Deprecated (see reduction). 用于控制 mini-batch 是否取平均。
  • reduction (string*,* optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'
    • ‘mean’
    • ‘sum’

shape

  • Input: (N, *) where * means, any number of additional dimensions
  • Target: (N, *) , same shape as the input
  • Output: scalar. If reduction is 'none', then (N, *), same shape as the input.

Examples:

>>> loss = nn.L1Loss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5)
>>> output = loss(input, target)
>>> output.backward()
MSELoss
torch.nn.MSELoss(reduction: str = 'mean')

均方误差:
l o s s ( x , y ) = 1 / n ∑ ( x i − y i ) 2 loss (x,y)=1/n\sum (x_i-y_i)^2 loss(x,y)=1/n(xiyi)2

Parameters、shape 同上。

Examples:

>>> loss = nn.MSELoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5)
>>> output = loss(input, target)
>>> output.backward()
CrossEntropyLoss
torch.nn.CrossEntropyLoss(weight: Optional[torch.Tensor] = None, 
                          ignore_index: int = -100, 
                          reduction: str = 'mean')

此标准将 LogSoftMaxNLLLoss 集成到一个类中,当训练一个多类分类器的时候,这个方法是十分有用的。

Parameters

  • weight: 1-D tensor,n 个元素,分别代表 n 个类别的权重,如果你的训练样本很不均衡的话,是非常有用的。默认值为 None。
  • ignore_index: 忽略某个类别。
  • reduction

shape

  • Input: (N,C) C 是类别的数量
  • Target: (N) Nmini-batch 的大小,即一个 mini_batch 的样本数量。

Examples:

>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()

Loss 可以表述为以下形式:
l o s s ( x , c l a s s ) = − log e x p ( x [ c l a s s ] ) ∑ j e x p ( x [ j ] ) )   = − x [ c l a s s ] + l o g ( ∑ j e x p ( x [ j ] ) ) \begin {aligned} loss (x, class) &= -\text {log}\frac {exp (x [class])}{\sum_j exp (x [j]))}\ &= -x [class] + log (\sum_j exp (x [j])) \end {aligned} loss(x,class)=logjexp(x[j]))exp(x[class]) =x[class]+log(jexp(x[j]))
weight 参数被指定的时候,loss 的计算公式变为:
l o s s ( x , c l a s s ) = w e i g h t s [ c l a s s ] ∗ ( − x [ c l a s s ] + l o g ( ∑ j e x p ( x [ j ] ) ) ) loss (x, class) = weights [class] * (-x [class] + log (\sum_j exp (x [j]))) loss(x,class)=weights[class](x[class]+log(jexp(x[j])))
计算出的 lossmini-batch 的大小取了平均。

CTCLOSS
torch.nn.CTCLoss(blank: int = 0, 
                 reduction: str = 'mean', 
                 zero_infinity: bool = False)

CTC Loss(连接时序分类损失)主要用在没有事先对齐的序列化数据训练上。比如语音识别、OCR 识别等等。

后续用到再补充。

NLLLoss
torch.nn.NLLLoss(weight: Optional[torch.Tensor] = None, 
                 ignore_index: int = -100, 
                 reduction: str = 'mean')

The negative log likelihood loss for training a classification problem with C classes.

CrossEntropyLoss 相差了一个 LogSoftmax 层。

Examples:

>>> m = nn.LogSoftmax(dim=1)
>>> loss = nn.NLLLoss()
>>> # input is of size N x C = 3 x 5
>>> input = torch.randn(3, 5, requires_grad=True)
>>> # each element in target has to have 0 <= value < C
>>> target = torch.tensor([1, 0, 4])
>>> output = loss(m(input), target)
>>> output.backward()
>>>
>>>
>>> # 2D loss example (used, for example, with image inputs)
>>> N, C = 5, 4
>>> loss = nn.NLLLoss()
>>> # input is of size N x C x height x width
>>> data = torch.randn(N, 16, 10, 10)
>>> conv = nn.Conv2d(16, C, (3, 3))
>>> m = nn.LogSoftmax(dim=1)
>>> # each element in target has to have 0 <= value < C
>>> target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
>>> output = loss(m(conv(data)), target)
>>> output.backward()
PoissonNLLLoss

用于目标值为泊松分布的负对数似然损失。

torch.nn.PoissonNLLLoss(log_input: bool = True, 
                        full: bool = False, 
                        eps: float = 1e-08, 
                        reduction: str = 'mean')
KLDivLoss

计算 KL 散度损失。

torch.nn.KLDivLoss(reduction: str = 'mean', 
                   log_target: bool = False)

KL 散度常用来描述两个分布的距离,并在输出分布的空间上执行直接回归是有用的。

NLLLoss 一样,给定的输入应该是 log-probabilities。然而。和 NLLLoss 不同的是,input 不限于 2-D tensor,因为此标准是基于 element 的。

target 应该和 input 的形状相同。

此 loss 可以表示为:
l o s s ( x , t a r g e t ) = 1 n ∑ i ( t a r g e t i ∗ ( l o g ( t a r g e t i ) − x i ) ) loss (x,target)=\frac {1}{n}\sum_i (target_i*(log (target_i)-x_i)) loss(x,target)=n1i(targeti(log(targeti)xi))
默认情况下,loss 会基于 element 求平均。如果 size_average=False loss 会被累加起来。

BCELoss
torch.nn.BCELoss(weight: Optional[torch.Tensor] = None, 
                 reduction: str = 'mean')

计算 targetoutput 之间的二进制交叉熵。
l o s s ( o , t ) = − 1 n ∑ i ( t [ i ] ∗ l o g ( o [ i ] ) + ( 1 − t [ i ] ) ∗ l o g ( 1 − o [ i ] ) ) loss (o,t)=-\frac {1}{n}\sum_i (t [i] *log(o[i])+(1-t[i])* log (1-o [i])) loss(o,t)=n1i(t[i]log(o[i])+(1t[i])log(1o[i]))
如果 weight 被指定 :
l o s s ( o , t ) = − 1 n ∑ i w e i g h t s [ i ] ∗ ( t [ i ] ∗ l o g ( o [ i ] ) + ( 1 − t [ i ] ) ∗ l o g ( 1 − o [ i ] ) ) loss (o,t)=-\frac {1}{n}\sum_iweights [i] *(t[i]* log(o[i])+(1-t[i])* log(1-o[i])) loss(o,t)=n1iweights[i](t[i]log(o[i])+(1t[i])log(1o[i]))
这个用于计算 auto-encoderreconstruction error。注意 0<=target [i]<=1。

默认情况下,loss 会基于 element 平均,如果 size_average=False 的话,loss 会被累加。

BCEWithLogitsLoss
torch.nn.BCEWithLogitsLoss(weight: Optional[torch.Tensor] = None, 
                           reduction: str = 'mean', 
                           pos_weight: Optional[torch.Tensor] = None)
MarginRankingLoss
torch.nn.MarginRankingLoss(margin: float = 0.0, 
                           reduction: str = 'mean')

创建一个标准,给定输入 x 1 x1 x1, x 2 x2 x2 两个 1-D mini-batch Tensor’s,和一个 y y y(1-D mini-batch tensor) , y y y 里面的值只能是 - 1 或 1。

如果 y=1,代表第一个输入的值应该大于第二个输入的值,如果 y=-1 的话,则相反。

mini-batch 中每个样本的 loss 的计算公式如下:
l o s s ( x , y ) = m a x ( 0 , − y ∗ ( x 1 − x 2 ) + m a r g i n ) loss(x, y) = max(0, -y * (x1 - x2) + margin) loss(x,y)=max(0,y(x1x2)+margin)
如果 size_average=True, 那么求出的 loss 将会对 mini-batch 求平均,反之,求出的 loss 会累加。默认情况下,size_average=True

HingeEmbeddingLoss
torch.nn.HingeEmbeddingLoss(margin: float = 1.0, 
                            reduction: str = 'mean')

给定一个输入 x x x(2-D mini-batch tensor) 和对应的 标签 y y y (1-D tensor,1,-1),此函数用来计算之间的损失值。这个 loss 通常用来测量两个输入是否相似,即:使用 L1 成对距离。典型是用在学习非线性 embedding 或者半监督学习中:
l o s s ( x , y ) = 1 n ∑ i { x i , i f   y i = = 1   m a x ( 0 , m a r g i n − x i ) , i f   y i = = − 1 loss (x,y)=\frac {1}{n}\sum_i \begin {cases} x_i, &\text if~y_i==1 \ max (0, margin-x_i), &if ~y_i==-1 \end {cases} loss(x,y)=n1i{xi,if yi==1 max(0,marginxi),if yi==1
x x x y y y 可以是任意形状,且都有 n 的元素,loss 的求和操作作用在所有的元素上,然后除以 n。如果您不想除以 n 的话,可以通过设置 size_average=False

margin 的默认值为 1, 可以通过构造函数来设置。

MultiLabelMarginLoss
torch.nn.MultiLabelMarginLoss(reduction: str = 'mean')

计算多标签分类的 hinge loss(margin-based loss) ,计算 loss 时需要两个输入: input x (2-D mini-batch Tensor),和 output y (2-D tensor 表示 mini-batch 中样本类别的索引)。
l o s s ( x , y ) = 1 x . s i z e ( 0 ) ∑ i = 0 , j = 0 I , J ( m a x ( 0 , 1 − ( x [ y [ j ] ] − x [ i ] ) ) ) loss (x, y) = \frac {1}{x.size (0)}\sum_{i=0,j=0}^{I,J}(max (0, 1 - (x [y [j]] - x [i]))) loss(x,y)=x.size(0)1i=0,j=0I,J(max(0,1(x[y[j]]x[i])))
其中 I=x.size(0),J=y.size(0)。对于所有的 ij,满足 y [ j ] ≠ 0 , i ≠ y [ j ] y [j]\neq0, i \neq y [j] y[j]=0,i=y[j]

xy 必须具有同样的 size

这个标准仅考虑了第一个非零 y[j] targets 此标准允许了,对于每个样本来说,可以有多个类别。

SmoothL1Loss

平滑版 L1 loss

torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction: str = 'mean')

loss 的公式如下:
l o s s ( x , y ) = 1 n ∑ i { 0.5 ∗ ( x i − y i ) 2 , i f   ∣ x i − y i ∣ < 1   ∣ x i − y i ∣ − 0.5 , o t h e r w i s e loss (x, y) = \frac {1}{n}\sum_i \begin {cases} 0.5*(x_i-y_i)^2, & if~|x_i - y_i| < 1\ |x_i - y_i| - 0.5, \\ \\ & otherwise \end {cases} loss(x,y)=n1i0.5(xiyi)2,if xiyi<1 xiyi0.5,otherwise
此 loss 对于异常点的敏感性不如 MSELoss,而且,在某些情况下防止了梯度爆炸,(参照 Fast R-CNN)。这个 loss 有时也被称为 Huber loss

x 和 y 可以是任何包含 n 个元素的 tensor。默认情况下,求出来的 loss 会除以 n,可以通过设置 size_average=True 使 loss 累加。

SoftMarginLoss
torch.nn.SoftMarginLoss(reduction: str = 'mean')

创建一个标准,用来优化 2 分类的 logistic loss。输入为 x(一个 2-D mini-batch Tensor)和 目标 y(一个包含 1 或 - 1 的 Tensor)。
l o s s ( x , y ) = 1 x . n e l e m e n t ( ) ∑ i ( l o g ( 1 + e x p ( − y [ i ] ∗ x [ i ] ) ) ) loss (x, y) = \frac {1}{x.nelement ()}\sum_i (log (1 + exp (-y [i]* x [i]))) loss(x,y)=x.nelement()1i(log(1+exp(y[i]x[i])))

MultiLabelSoftMarginLoss
torch.nn.MultiLabelSoftMarginLoss(weight: Optional[torch.Tensor] = None, 
                                  reduction: str = 'mean')

创建一个标准,基于输入 x 和目标 y 的 max-entropy,优化多标签 one-versus-all 的损失。x:2-D mini-batch Tensor;y:binary 2D Tensor。对每个 mini-batch 中的样本,对应的 loss 为:
l o s s ( x , y ) = − 1 x . n E l e m e n t ( ) ∑ i = 0 I y [ i ] log e x p ( x [ i ] ) ( 1 + e x p ( x [ i ] ) + ( 1 − y [ i ] ) log 1 1 + e x p ( x [ i ] ) loss (x, y) = - \frac {1}{x.nElement ()}\sum_{i=0}^I y [i]\text {log}\frac {exp (x [i])}{(1 + exp (x [i])} + (1-y [i])\text {log}\frac {1}{1+exp (x [i])} loss(x,y)=x.nElement()1i=0Iy[i]log(1+exp(x[i])exp(x[i])+(1y[i])log1+exp(x[i])1
其中 I=x.nElement()-1, y [ i ] ∈ 0 , 1 y[i] \in {0,1} y[i]0,1yx 必须要有同样 size

CosineEmbeddingLoss
torch.nn.CosineEmbeddingLoss(margin: float = 0.0, 
                             reduction: str = 'mean')

给定 输入 Tensorsx1, x2 和一个标签 Tensor y(元素的值为 1 或 - 1)。此标准使用 cosine 距离测量两个输入是否相似,一般用来用来学习非线性 embedding 或者半监督学习。

margin 应该是 - 1 到 1 之间的值,建议使用 0 到 0.5。如果没有传入 margin 实参,默认值为 0。

每个样本的 loss 是:
l o s s ( x , y ) = { 1 − c o s ( x 1 , x 2 ) , i f   y = = 1 max ⁡ ( 0 , c o s ( x 1 , x 2 ) − m a r g i n ) , i f   y = = − 1 loss (x, y) = \begin {cases} 1 - cos (x1, x2), &if~y == 1 \max (0, cos (x1, x2) - margin), \\ \\ &if~y == -1 \end {cases} loss(x,y)=1cos(x1,x2),if y==1max(0,cos(x1,x2)margin),if y==1
如果 size_average=True 求出的 loss 会对 batch 求均值,如果 size_average=False 的话,则会累加 loss。默认情况 size_average=True

MultiMarginLoss
torch.nn.MultiMarginLoss(p: int = 1, 
                         margin: float = 1.0, 
                         weight: Optional[torch.Tensor] = None, 
                         reduction: str = 'mean')

用来计算 multi-class classification 的 hinge loss(magin-based loss)。输入是 x(2D mini-batch Tensor), y(1D Tensor) 包含类别的索引, 0 <= y <= x.size(1))

对每个 mini-batch 样本:
l o s s ( x , y ) = 1 x . s i z e ( 0 ) ∑ i = 0 I ( m a x ( 0 , m a r g i n − x [ y ] + x [ i ] ) p ) loss (x, y) = \frac {1}{x.size (0)}\sum_{i=0}^I (max (0, margin - x [y] + x [i])^p) loss(x,y)=x.size(0)1i=0I(max(0,marginx[y]+x[i])p)
其中 I=x.size(0) i ≠ y i\neq y i=y。 可选择的,如果您不想所有的类拥有同样的权重的话,您可以通过在构造函数中传入 weights 参数来解决这个问题,weights 是一个 1D 权重 Tensor。

传入 weights 后,loss 函数变为:
l o s s ( x , y ) = 1 x . s i z e ( 0 ) ∑ i m a x ( 0 , w [ y ] ∗ ( m a r g i n − x [ y ] − x [ i ] ) ) p loss (x, y) = \frac {1}{x.size (0)}\sum_imax (0, w [y] * (margin - x [y] - x [i]))^p loss(x,y)=x.size(0)1imax(0,w[y](marginx[y]x[i]))p
默认情况下,求出的 loss 会对 mini-batch 取平均,可以通过设置 size_average=False 来取消取平均操作。

自定义损失函数

不如来看最简单的 L1Loss 的实现

class L1Loss(_Loss):
    
    __constants__ = ['reduction']

    def __init__(self, size_average=None, reduce=None, reduction: str = 'mean') -> None:
        super(L1Loss, self).__init__(size_average, reduce, reduction)

    def forward(self, input: Tensor, target: Tensor) -> Tensor:
        return F.l1_loss(input, target, reduction=self.reduction)

可以看到,L1Loss 类似子模块的定义,在 forward 中,调用了 F 中的函数,

def l1_loss(input, target, size_average=None, reduce=None, reduction='mean'):
    # type: (Tensor, Tensor, Optional[bool], Optional[bool], str) -> Tensor
    if not torch.jit.is_scripting():
        tens_ops = (input, target)
        if any([type(t) is not Tensor for t in tens_ops]) and has_torch_function(tens_ops):
            return handle_torch_function(
                l1_loss, tens_ops, input, target, size_average=size_average, reduce=reduce,
                reduction=reduction)
    if not (target.size() == input.size()):
        warnings.warn("Using a target size ({}) that is different to the input size ({}). "
                      "This will likely lead to incorrect results due to broadcasting. "
                      "Please ensure they have the same size.".format(target.size(), input.size()),
                      stacklevel=2)
    if size_average is not None or reduce is not None:
        reduction = _Reduction.legacy_get_string(size_average, reduce)
    if target.requires_grad:
        ret = torch.abs(input - target)
        if reduction != 'none':
            ret = torch.mean(ret) if reduction == 'mean' else torch.sum(ret)
    else:
        expanded_input, expanded_target = torch.broadcast_tensors(input, target)
        ret = torch._C._nn.l1_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
    return ret
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值