Pytorch（5）：损失函数

最新推荐文章于 2024-01-25 19:54:13 发布

一剑何风情

最新推荐文章于 2024-01-25 19:54:13 发布

阅读量1.7k

点赞数 1

分类专栏： Pytorch

本文链接：https://blog.csdn.net/weixin_37641832/article/details/108843659

版权

Pytorch 专栏收录该内容

6 篇文章 2 订阅

订阅专栏

文章目录

Loss Functions in Pytorch

`nn.L1Loss`	Creates a criterion that measures the mean absolute error (MAE) between each element in the input $x$ and target $y$ .
`nn.MSELoss`	Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input $x$ and target $y$ .
`nn.CrossEntropyLoss`	This criterion combines `nn.LogSoftmax()` and `nn.NLLLoss()` in one single class.
`nn.CTCLoss`	The Connectionist Temporal Classification loss.
`nn.NLLLoss`	The negative log likelihood loss.
`nn.PoissonNLLLoss`	Negative log likelihood loss with Poisson distribution of target.
`nn.KLDivLoss`	The `Kullback-Leibler divergence`_ Loss
`nn.BCELoss`	Creates a criterion that measures the Binary Cross Entropy between the target and the output:
`nn.BCEWithLogitsLoss`	This loss combines a Sigmoid layer and the BCELoss in one single class.
`nn.MarginRankingLoss`	Creates a criterion that measures the loss given inputs $x_1$ , $x_2$ , two 1D mini-batch Tensors, and a label 1D mini-batch tensor $y$ (containing 1 or -1).
`nn.HingeEmbeddingLoss`	Measures the loss given an input tensor $x$ and a labels tensor $y$ (containing 1 or -1).
`nn.MultiLabelMarginLoss`	Creates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input $x$ (a 2D mini-batch Tensor) and output $y$ (which is a 2D Tensor of target class indices).
`nn.SmoothL1Loss`	Creates a criterion that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise.
`nn.SoftMarginLoss`	Creates a criterion that optimizes a two-class classification logistic loss between input tensor $x$ and target tensor $y$ (containing 1 or -1).
`nn.MultiLabelSoftMarginLoss`	Creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input $x$ and target $y$ of size $(N, C)$ .
`nn.CosineEmbeddingLoss`	Creates a criterion that measures the loss given input tensors $x_1$ , $x_2$ and a Tensor label $y$ with values 1 or -1.
`nn.MultiMarginLoss`	Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input $x$ (a 2D mini-batch Tensor) and output $y$ (which is a 1D tensor of target class indices, $\leq y \leq \text{x.size}(1)-1$ ):
`nn.TripletMarginLoss`	Creates a criterion that measures the triplet loss given an input tensors $x_1$ , $x_2$ , $x_3$ and a margin with a value greater than 0 .

基本用法：

# 构造函数有自己的参数
criterion = LossCriterion() 

# 调用标准时也有参数
loss = criterion(x, y)

解释：

第一行代码：在 Pytorch 中，所有损失函数都定义为一个 class，因此，使用损失函数的第一步是实例化。

第二行代码：

在 Pytorch 中，所有损失函数都继承于父类 _Loss，而 _Loss 又同样继承于 Module，前面介绍过 Mudule 是 callable，因此，损失函数的实例也是 callable，此时可传入必须的参数，如预测结果 $x$ 和真实值 $y$ 。

一些说明

1、

在早期版本的 Pytorch 中，使用 bool 型参数 size_average 决定是否对计算出来的总损失求平均（即除以样本数 n），现已弃用，改为使用 reduction，

默认 reduction='mean'，即求平均。
可选 reduction='sum'。
None

下面举一个简单的例子，

inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')

# forward
loss_none = loss_f_none(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)

# view
print("Cross Entropy Loss:\n ", loss_none, loss_sum, loss_mean)

输出

Cross Entropy Loss:
  tensor([1.3133, 0.1269, 0.1269]) tensor(1.5671) tensor(0.5224)

2、

同样的，在早期版本的 Pytorch 中，使用 bool 型参数 reduce 决定是否对 mini-batch 的总损失求平均，现已弃用，默认对 mini-batch 取了平均。

3、

在下面的损失函数的介绍中，将直接省略 size_average 和 reduce。

4、

下述的 1-D 是一维的意思。

5、

下述的 Parameters 指的是实例化时的参数。

下述的 shape 包括了调用时的参数，及 output 值。

L1Loss

torch.nn.L1Loss(size_average=None, reduce=None, reduction: str = 'mean')

创建一个衡量输入预测结果 x 和目标 y 之间差的绝对值的平均值的标准。
$(x,y)=1/n\sum|x_i-y_i|$

实例化时的参数：

size_average (bool, optional) – Deprecated (see reduction). 若 size_average=False，那么求出来的绝对值的和将不会除以 n.
reduce (bool, optional) – Deprecated (see reduction). 用于控制 mini-batch 是否取平均。
reduction (string*,* optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'
- ‘mean’
- ‘sum’

shape

Input: (N, *) where * means, any number of additional dimensions
Target: (N, *) , same shape as the input
Output: scalar. If reduction is 'none', then (N, *), same shape as the input.

Examples:

>>> loss = nn.L1Loss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5)
>>> output = loss(input, target)
>>> output.backward()

MSELoss

torch.nn.MSELoss(reduction: str = 'mean')

均方误差：
$(x,y)=1/n\sum (x_i-y_i)^2$

Parameters、shape 同上。

Examples:

>>> loss = nn.MSELoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5)
>>> output = loss(input, target)
>>> output.backward()

CrossEntropyLoss

torch.nn.CrossEntropyLoss(weight: Optional[torch.Tensor] = None, 
                          ignore_index: int = -100, 
                          reduction: str = 'mean')

此标准将 LogSoftMax 和 NLLLoss 集成到一个类中，当训练一个多类分类器的时候，这个方法是十分有用的。

Parameters

weight: 1-D tensor，n 个元素，分别代表 n 个类别的权重，如果你的训练样本很不均衡的话，是非常有用的。默认值为 None。
ignore_index: 忽略某个类别。
reduction

shape

Input: (N,C) C 是类别的数量
Target: (N) N 是 mini-batch 的大小，即一个 mini_batch 的样本数量。

Examples:

>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()

Loss 可以表述为以下形式：
$\begin {aligned} loss (x, class) &= -\text {log}\frac {exp (x [class])}{\sum_j exp (x [j]))}\ &= -x [class] + log (\sum_j exp (x [j])) \end {aligned}$
当 weight 参数被指定的时候，loss 的计算公式变为：
$(\sum_j exp (x [j])))$
计算出的 loss 对 mini-batch 的大小取了平均。

CTCLOSS

torch.nn.CTCLoss(blank: int = 0, 
                 reduction: str = 'mean', 
                 zero_infinity: bool = False)

CTC Loss（连接时序分类损失）主要用在没有事先对齐的序列化数据训练上。比如语音识别、OCR 识别等等。

后续用到再补充。

NLLLoss

torch.nn.NLLLoss(weight: Optional[torch.Tensor] = None, 
                 ignore_index: int = -100, 
                 reduction: str = 'mean')

The negative log likelihood loss for training a classification problem with C classes.

和 CrossEntropyLoss 相差了一个 LogSoftmax 层。

Examples:

>>> m = nn.LogSoftmax(dim=1)
>>> loss = nn.NLLLoss()
>>> # input is of size N x C = 3 x 5
>>> input = torch.randn(3, 5, requires_grad=True)
>>> # each element in target has to have 0 <= value < C
>>> target = torch.tensor([1, 0, 4])
>>> output = loss(m(input), target)
>>> output.backward()
>>>
>>>
>>> # 2D loss example (used, for example, with image inputs)
>>> N, C = 5, 4
>>> loss = nn.NLLLoss()
>>> # input is of size N x C x height x width
>>> data = torch.randn(N, 16, 10, 10)
>>> conv = nn.Conv2d(16, C, (3, 3))
>>> m = nn.LogSoftmax(dim=1)
>>> # each element in target has to have 0 <= value < C
>>> target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
>>> output = loss(m(conv(data)), target)
>>> output.backward()

PoissonNLLLoss

用于目标值为泊松分布的负对数似然损失。

torch.nn.PoissonNLLLoss(log_input: bool = True, 
                        full: bool = False, 
                        eps: float = 1e-08, 
                        reduction: str = 'mean')

KLDivLoss

计算 KL 散度损失。

torch.nn.KLDivLoss(reduction: str = 'mean', 
                   log_target: bool = False)

KL 散度常用来描述两个分布的距离，并在输出分布的空间上执行直接回归是有用的。

与 NLLLoss 一样，给定的输入应该是 log-probabilities。然而。和 NLLLoss 不同的是，input 不限于 2-D tensor，因为此标准是基于 element 的。

target 应该和 input 的形状相同。

此 loss 可以表示为：
$(x,target)=\frac {1}{n}\sum_i (target_i*(log (target_i)-x_i))$
默认情况下，loss 会基于 element 求平均。如果 size_average=False loss 会被累加起来。

BCELoss

torch.nn.BCELoss(weight: Optional[torch.Tensor] = None, 
                 reduction: str = 'mean')

计算 target 与 output 之间的二进制交叉熵。
$(o,t)=-\frac {1}{n}\sum_i (t [i] *log(o[i])+(1-t[i])* log (1-o [i]))$
如果 weight 被指定：
$(o,t)=-\frac {1}{n}\sum_iweights [i] *(t[i]* log(o[i])+(1-t[i])* log(1-o[i]))$
这个用于计算 auto-encoder 的 reconstruction error。注意 0<=target [i]<=1。

默认情况下，loss 会基于 element 平均，如果 size_average=False 的话，loss 会被累加。

BCEWithLogitsLoss

torch.nn.BCEWithLogitsLoss(weight: Optional[torch.Tensor] = None, 
                           reduction: str = 'mean', 
                           pos_weight: Optional[torch.Tensor] = None)

MarginRankingLoss

torch.nn.MarginRankingLoss(margin: float = 0.0, 
                           reduction: str = 'mean')

创建一个标准，给定输入 $x 1$ , $x 2$ 两个 1-D mini-batch Tensor’s，和一个 $y$ (1-D mini-batch tensor) , $y$ 里面的值只能是 - 1 或 1。

如果 y=1，代表第一个输入的值应该大于第二个输入的值，如果 y=-1 的话，则相反。

mini-batch 中每个样本的 loss 的计算公式如下：
$l o s s (x, y) = m a x (0, - y * (x 1 - x 2) + m a r g i n)$
如果 size_average=True, 那么求出的 loss 将会对 mini-batch 求平均，反之，求出的 loss 会累加。默认情况下，size_average=True。

HingeEmbeddingLoss

torch.nn.HingeEmbeddingLoss(margin: float = 1.0, 
                            reduction: str = 'mean')

给定一个输入 $x$ (2-D mini-batch tensor) 和对应的标签 $y$ (1-D tensor,1,-1)，此函数用来计算之间的损失值。这个 loss 通常用来测量两个输入是否相似，即：使用 L1 成对距离。典型是用在学习非线性 embedding 或者半监督学习中：
$(x,y)=\frac {1}{n}\sum_i \begin {cases} x_i, &\text if~y_i==1 \ max (0, margin-x_i), &if ~y_i==-1 \end {cases}$
$x$ 和 $y$ 可以是任意形状，且都有 n 的元素，loss 的求和操作作用在所有的元素上，然后除以 n。如果您不想除以 n 的话，可以通过设置 size_average=False。

margin 的默认值为 1, 可以通过构造函数来设置。

MultiLabelMarginLoss

torch.nn.MultiLabelMarginLoss(reduction: str = 'mean')

计算多标签分类的 hinge loss(margin-based loss) ，计算 loss 时需要两个输入： input x (2-D mini-batch Tensor)，和 output y (2-D tensor 表示 mini-batch 中样本类别的索引)。
$\frac {1}{x.size (0)}\sum_{i=0,j=0}^{I,J}(max (0, 1 - (x [y [j]] - x [i])))$
其中 I=x.size(0),J=y.size(0)。对于所有的 i 和 j，满足 $[j]\neq0, i \neq y [j]$

x 和 y 必须具有同样的 size。

这个标准仅考虑了第一个非零 y[j] targets 此标准允许了，对于每个样本来说，可以有多个类别。

SmoothL1Loss

平滑版 L1 loss。

torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction: str = 'mean')

loss 的公式如下：
$\frac {1}{n}\sum_i \begin {cases} 0.5*(x_i-y_i)^2, & if~|x_i - y_i| < 1\ |x_i - y_i| - 0.5, \\ \\ & otherwise \end {cases}$
此 loss 对于异常点的敏感性不如 MSELoss，而且，在某些情况下防止了梯度爆炸，(参照 Fast R-CNN)。这个 loss 有时也被称为 Huber loss。

x 和 y 可以是任何包含 n 个元素的 tensor。默认情况下，求出来的 loss 会除以 n，可以通过设置 size_average=True 使 loss 累加。

SoftMarginLoss

torch.nn.SoftMarginLoss(reduction: str = 'mean')

创建一个标准，用来优化 2 分类的 logistic loss。输入为 x（一个 2-D mini-batch Tensor）和目标 y（一个包含 1 或 - 1 的 Tensor）。
$\frac {1}{x.nelement ()}\sum_i (log (1 + exp (-y [i]* x [i])))$

MultiLabelSoftMarginLoss

torch.nn.MultiLabelSoftMarginLoss(weight: Optional[torch.Tensor] = None, 
                                  reduction: str = 'mean')

创建一个标准，基于输入 x 和目标 y 的 max-entropy，优化多标签 one-versus-all 的损失。x:2-D mini-batch Tensor;y:binary 2D Tensor。对每个 mini-batch 中的样本，对应的 loss 为：
$\frac {1}{x.nElement ()}\sum_{i=0}^I y [i]\text {log}\frac {exp (x [i])}{(1 + exp (x [i])} + (1-y [i])\text {log}\frac {1}{1+exp (x [i])}$
其中 I=x.nElement()-1, $\in {0,1}$ ，y 和 x 必须要有同样 size。

CosineEmbeddingLoss

torch.nn.CosineEmbeddingLoss(margin: float = 0.0, 
                             reduction: str = 'mean')

给定输入 Tensors，x1, x2 和一个标签 Tensor y(元素的值为 1 或 - 1)。此标准使用 cosine 距离测量两个输入是否相似，一般用来用来学习非线性 embedding 或者半监督学习。

margin 应该是 - 1 到 1 之间的值，建议使用 0 到 0.5。如果没有传入 margin 实参，默认值为 0。

每个样本的 loss 是：
$\begin {cases} 1 - cos (x1, x2), &if~y == 1 \max (0, cos (x1, x2) - margin), \\ \\ &if~y == -1 \end {cases}$
如果 size_average=True 求出的 loss 会对 batch 求均值，如果 size_average=False 的话，则会累加 loss。默认情况 size_average=True。

MultiMarginLoss

torch.nn.MultiMarginLoss(p: int = 1, 
                         margin: float = 1.0, 
                         weight: Optional[torch.Tensor] = None, 
                         reduction: str = 'mean')

用来计算 multi-class classification 的 hinge loss（magin-based loss）。输入是 x(2D mini-batch Tensor), y(1D Tensor) 包含类别的索引， 0 <= y <= x.size(1))。

对每个 mini-batch 样本：
$\frac {1}{x.size (0)}\sum_{i=0}^I (max (0, margin - x [y] + x [i])^p)$
其中 I=x.size(0) $i\neq y$ 。可选择的，如果您不想所有的类拥有同样的权重的话，您可以通过在构造函数中传入 weights 参数来解决这个问题，weights 是一个 1D 权重 Tensor。

传入 weights 后，loss 函数变为：
$\frac {1}{x.size (0)}\sum_imax (0, w [y] * (margin - x [y] - x [i]))^p$
默认情况下，求出的 loss 会对 mini-batch 取平均，可以通过设置 size_average=False 来取消取平均操作。

自定义损失函数

不如来看最简单的 L1Loss 的实现

class L1Loss(_Loss):
    
    __constants__ = ['reduction']

    def __init__(self, size_average=None, reduce=None, reduction: str = 'mean') -> None:
        super(L1Loss, self).__init__(size_average, reduce, reduction)

    def forward(self, input: Tensor, target: Tensor) -> Tensor:
        return F.l1_loss(input, target, reduction=self.reduction)

可以看到，L1Loss 类似子模块的定义，在 forward 中，调用了 F 中的函数，

def l1_loss(input, target, size_average=None, reduce=None, reduction='mean'):
    # type: (Tensor, Tensor, Optional[bool], Optional[bool], str) -> Tensor
    if not torch.jit.is_scripting():
        tens_ops = (input, target)
        if any([type(t) is not Tensor for t in tens_ops]) and has_torch_function(tens_ops):
            return handle_torch_function(
                l1_loss, tens_ops, input, target, size_average=size_average, reduce=reduce,
                reduction=reduction)
    if not (target.size() == input.size()):
        warnings.warn("Using a target size ({}) that is different to the input size ({}). "
                      "This will likely lead to incorrect results due to broadcasting. "
                      "Please ensure they have the same size.".format(target.size(), input.size()),
                      stacklevel=2)
    if size_average is not None or reduce is not None:
        reduction = _Reduction.legacy_get_string(size_average, reduce)
    if target.requires_grad:
        ret = torch.abs(input - target)
        if reduction != 'none':
            ret = torch.mean(ret) if reduction == 'mean' else torch.sum(ret)
    else:
        expanded_input, expanded_target = torch.broadcast_tensors(input, target)
        ret = torch._C._nn.l1_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
    return ret