pytorch常用的四个损失函数_pytorch四元组损失-CSDN博客

本文链接：https://blog.csdn.net/Akun_2217/article/details/134587153

pytorch常用的四个损失函数

1， torch.nn.MSELoss(size_average=True, reduce=True, reduction=‘mean’)
$\zeta(x, y) = L = \{l_1,...l_n \},\ \ \ \ l_n = (x_n - y_n)^2$
size_average和reduce之后的版本会被弃用，用reduction一个参数设置就可以。reduce的意思就是将向量转换成标量。reduce=True的情况下， size_average参数才会起作用。

参数设置方式一：

reduce=True + size_average=True : reduction=‘mean’ ，返回
$\frac{l_1+...+l_n}{len(L)}$

参数设置方式二：

reduce=True + size_average=False：reduction=‘sum’ ，返回
$l_1+...+l_n$

2， torch.nn.L1Loss(size_average=True, reduce=True, reduction=‘mean’)
$\zeta(x, y) = L = \{l_1,...,l_n\},\ \ \ l_n=|x_n-y_n|$
使用方式和MSELoss一样，其实就是MAELoss。

3, torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction=‘mean’, label_smoothing=0.0)

使用时输入格式如下：
$\ \ \ \ \ target:[C])$
注意一点:target输入之后，会被改为[minibatch, C]形状的one-hot形式的矩阵。

参数说明：

1，weight：1D tensor [C]，分类的权重。

2，label_smoothing：一条样本，并不是确定的属于哪一个分类，而是如下这种概率分布，此时target不再是one-hot形式的矩阵了。
$$
y_{n,c} = \begin{cases}
1 - \varepsilon & if & y_{n,c}=y_n \
\varepsilon/(C-1) & if & y_{n,c}\neq y_n

\end{cases}
$$

如果label_smoothing被指定了，那么ignore_index就不起作用了，计算公式如下：
$\zeta(x, y) = L = \{l_1, ..., l_N\}^T\\l_n=-\sum_{c=1}^{C}{w_c}*{y_{n,c} * log(\frac{e^{x_{n, c}}}{\sum_{i=1}^{C}{e^{x_{n,i}}}})}\ \ \$
用代码描述就是：

# input.size()=[minibatch, C], target.size()=[C]
import torch.nn.functional as f

input = torch.randn((4, 10))
target = torch.randint(0,10, (4,))
label_smoothing = 0.09
weight = torch.randint(0, 100,(10,))

# 将target转换成one-hot形式,参考scatter解释, 
target = torch.zeros((4, 10)).scatter(-1, target.unsqueeze(-1), 1)
'''
tensor([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.]])
'''
# 将label_smoothing传入
target = torch.where(target==1, 1-label_smoothing, label_smoothing/9)
'''
tensor([[0.0100, 0.0100, 0.0100, 0.0100, 0.0100, 0.9100, 0.0100, 0.0100, 0.0100,
         0.0100],
        [0.0100, 0.0100, 0.0100, 0.0100, 0.0100, 0.9100, 0.0100, 0.0100, 0.0100,
         0.0100],
        [0.0100, 0.9100, 0.0100, 0.0100, 0.0100, 0.0100, 0.0100, 0.0100, 0.0100,
         0.0100],
        [0.0100, 0.0100, 0.0100, 0.0100, 0.9100, 0.0100, 0.0100, 0.0100, 0.0100,
         0.0100]])
'''
input = -(torch.log(f.softmax(input, dim=-1)))
L = (input * target * weight).sum(dim=-1)
'''
tensor([-28.3338,  10.1566,  32.1825,  45.5547])
'''
if reduction == 'sum':
    return L.sum()
elif reduction == 'mean':
    return L.sum() / 4

3, ignore_index ：当确定某一个分类是非常不可靠的时候，使用ignore_index指定，计算方式如下；
$KaTeX parse error: Expected 'EOF', got '_' at position 125: …neq\text{ignore_̲index}}\}$
其实就是将log(softmax(x))对应的分类得分，取出来，进行加和，如果分类是ignore_index,则不参与计算。

input = torch.randn((4, 10))
target = torch.randint(0,10, (4,))
ignore_index = 3
weight = torch.randint(0, 100,(10,))

# target；tensor([6, 0, 3, 6])改成one-hot形式， ignore_index改为[1,1,0,1]
ignore_index = torch.where(target==3, 0, 1)
target = torch.zeros((4, 10)).scatter(-1, target.unsqueeze(-1), 1)

input = -torch.log(f.softmax(input, dim=-1))
L = (input * target * weight).sum(dim=-1)
L = ignore_index * L

if reduction = 'sum':
    return L.sum()
elif reduction = 'mean':
    return L.sum() / ((target * weight).sum(-1) * ignore_index).sum()

总结一下：

input:[minibatch, c].softmax.log * target:[minibatch, C] * weight :[C] -> [minibatch, C].sum(-1)->[minibatch]。ignore_index增加了一步骤：[minibatch] * ignore_index()。并且ignore_index是除以权重，而label_smoothing是除以样本数。

4，torch.nn.NLLLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction=‘mean’)

size_average/reduce/reduction：这三个参数其他都一样的用法。

不经过softmax与log计算，直接 input * target * weight * ignore_index。
$\zeta(x, y)=L=\{l_1,...l_N\}\\ l_n = -w_{y_n} * x_{n,y_n}, w_c=weight[c]*1\{c\neq ignore_index\}$