pytorch中的损失函数

0.前言

深度学习中优化方法直接作用的对象是损失函数。损失函数表示了预测值与真实值之间的差距程度，一个最优化问题的目标是将损失函数最小化，针对分类问题，直观的表现就是.分类的正确样本越多越好；回归问题中，直观的表现就是预测值与实际值的误差越小越好。

• 损失函数(Loss Function):
L o s s = f ( y , , y ) Loss=f(y^,,y)
• 代价函数(Cost Fuction):
C o s t = 1 N ∑ i = 0 N f ( y i , y i ) Cost=\frac{1}{N}\sum_{i=0}^{N}f(y_{i}^, y_{i})
Pytorch中nn模块下提供了多种可以直接使用的损失函数，如交叉熵、均方误差等，针对不同的问题，可以直接调用现有的损失函数，常用的损失函数以及适合的问题如下表。

torch.nn.L1Loss()平均绝对值损失回归
torch.nn.MSELoss()均方误差损失回归
torch.nn.CrossEntropyLoss()交叉熵损失多分类
torch.nn.CTCLoss()
torch.nn.NLLLoss()负数对数似然函数损失多分类
torch.nn.KLDivLoss()KL散度损失回归
torch.nn.BCELoss()二分类交叉熵损失二分类
torch.nn.MarginRankingLoss评价相似度损失
torch.nn.MultiLabelMarginLoss多标签分类损失多标签分类
torch.nn.SmoothL1Loss平滑L1损失回归
torch.nn.SoftMarginLoss多标签二分类损失多标签二分类

1.Loss Function

1.1 _Loss基类

在pytorch中nn模块下定义的loss的源码类,分别定义LOSS的类以及的带有权重系数的类。

from .module import Module
from .. import functional as F
from .. import _reduction as _Reduction

from torch import Tensor
from typing import Optional

class _Loss(Module):
reduction: str

def __init__(self, size_average=None, reduce=None, reduction: str = 'mean') -> None:
super(_Loss, self).__init__()
if size_average is not None or reduce is not None:
self.reduction = _Reduction.legacy_get_string(size_average, reduce)
else:
self.reduction = reduction

class _WeightedLoss(_Loss):
def __init__(self, weight: Optional[Tensor] = None, size_average=None, reduce=None, reduction: str = 'mean') -> None:
super(_WeightedLoss, self).__init__(size_average, reduce, reduction)
self.register_buffer('weight', weight)



1.2 nn.CrossEntropyLoss

1.2.1 有关交叉熵、信息熵、相对熵的基本概念：

1.交叉熵
H ( P , Q ) = − ∑ i = 1 N P ( x i ) l o g Q ( x i ) H(P,Q) = -\sum_{i=1}^NP(x_{i})logQ(x_{i})
2.自信息，衡量单个事件的不确定性
l ( x ) = − l o g [ p ( x ) ] l(x) = -log[p(x)]
3.熵（信息熵），简答讲事件的越不确定性越大，熵的值越大，自信的期望
H ( P ) = E x   p [ I ( x ) ] = − ∑ i N P ( x i ) l o g P ( x i ) H(P) = E_{x~p}[I(x)] = -\sum_{i}^NP(x_{i})logP(x_{i})
4.相对熵（KL散度），衡量两个分布之间的差异，不具备对称性。
D K L ( P , Q ) = E x   p [ l o g P ( x ) Q ( x ) ] = E x − p [ l o g P ( x ) − l o g Q ( x ) ] = ∑ i = 1 N P ( x i ) [ l o g P ( x i ) − l o g Q ( x i ) ] = ∑ i = 1 N P ( x i ) l o g P ( x i ) − ∑ i = 1 N P ( x i ) l o g Q ( x i ) = H ( P , Q ) − H ( P ) D_{KL}(P,Q) = E_{x~p}[log\frac{P(x)}{Q(x)}]\\=E_{x-p}[logP(x)-logQ(x)]\\=\sum_{i=1}^NP(x_{i})[logP(x_{i})-logQ(x_{i})]\\=\sum_{i=1}^NP(x_{i})logP(x_{i})-\sum_{i=1}^NP(x_{i})logQ(x_{i})\\=H(P,Q)-H(P)

结合上面的公式可以得出结论: 交叉熵： H ( P , Q ) = D K L ( P , Q ) + H ( P ) \text{交叉熵：}H(P, Q) = D_{KL}(P,Q)+H(P) ,其中P代表实际样本的数据分布，Q代表预测结果的分布。

1.2.2 pytorch中的交叉熵

功能：nn.LogSoftmax()与nn.NLLLoss()结合，进行交叉熵计算。本该损失函数与公式中的交叉熵损失存在区别，采用了nn.LogSoftmax对数据进行归一化处理，即[0,1]的区间。

在官网的计算公式如下：

1. 无权重
l o s s ( x , c l a s s ) = − l o g ( e x p ( x [ c l a s s ] ) ∑ j e x p ( x [ j ] ) ) = − x [ c l a s s ] + l o g ( ∑ j e x p ( x [ j ] ) ) loss(x, class)=-log(\frac{exp(x[class])}{\sum_{j}exp(x[j])}) \\=-x[class] + log(\sum_{j}exp(x[j]))
2. 有权重
l o s s ( x , c l a s s ) = w e i g h t [ c l a s s ] ( − x [ c l a s s ] + l o g ( ∑ j e x p ( x [ j ] ) ) ) loss(x, class) = weight[class](-x[class] + log(\sum_{j}exp(x[j])))

将pytorch中的定义与原始交叉熵公式 H ( P , Q ) = − ∑ i = 1 N P ( x i ) l o g Q ( x i ) H(P,Q) = -\sum_{i=1}^NP(x_{i})logQ(x_{i}) 相对缺少了求和以及 P x i P{x_{i}} 。因为pytorch中是对某一个元素求交叉熵，因此不需要求和项，而且已经确定的了是哪一个元素，因此 P x i = 1 P{x_{i}}=1 ,综上pytorch中的交叉熵公式可以简单为 H ( P , Q ) = − l o g ( Q ( x i ) ) H(P,Q)=-log(Q(x_{i}))


torch.nn.CrossEntropyLoss(weight: Optional[torch.Tensor] = None,  # 各类别loss设置的权重
size_average=None,
ignore_index: int = -100,                   # 忽略某个类别
reduce=None,
reduction: str = 'mean')                    # 计算模式 可以为none/sum/mean,none-逐个元素计算；sum-所有元素求和； mean-加权平均，返回标量。


通过代码示例对此函数中的相关参数设置进行理解

import torch
import torch.nn as nn

import numpy as np
#------fake data

inputs =torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
targets = torch.tensor([0, 1, 1], dtype=torch.long)

# ------------
flag = 0
if flag:

loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')

# forward
loss_none = loss_f_none(inputs, targets)
loss_sum = loss_f_sum(inputs, targets)
loss_mean = loss_f_mean(inputs, targets)

# view
print(f'Cross Entropy loss: \n{loss_none, loss_sum, loss_mean}')
>>>
Cross Entropy loss:
(tensor([1.3133, 0.1269, 0.1269]), tensor(1.5671), tensor(0.5224))


为了进一步的熟悉pytorch中CrossEntropyLoss计算过程，手动编写了一个计算过程，代码如下：

##--------------compute by hand
flag = 1
if flag:
idx = 0
#inputs =torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
#targets = torch.tensor([0, 1, 1], dtype=torch.long)

inputs_1 = inputs.detach().numpy()[idx]
targets_1 = targets.numpy()[idx]

# 第一项
x_class = inputs_1[targets_1]

# 第二项
sigma_exp_x = np.sum(list(map(np.exp, inputs_1)))
log_sigma_exp_x = np.log(sigma_exp_x)

# 输出loss
loss_1 = -x_class + log_sigma_exp_x
print('第一个样本loss 为：',loss_1)
>>>
'''

log(exp(x[j])) = ln(e+e^2)
x[class] = 1
>>>loss = ln(e+e^2) -1
'''
第一个样本loss 为： 1.3132617


比较上面的那个代码块的运行结果可以发现，计算结果是一致的。

1.3 nn.NLLLoss

功能：实现负对数似然函数的负号功能，计算公式
l ( x , y ) = L = ( l i , . . . . , l N ) T , l n = − w y n x n , y n l(x, y)=L=(l_{i},....,l_{N})^T,l_{n}=-w_{yn}x_{n,y_{n}}


nn.NLLLoss(weight=None, # 各类别的loss设置的权值
size_average=None,
ignore_index=-100,  # 忽略某个类别
reduce=None,
reduce='mean')   # 计算模式


直接通过代码观察此损失函数


import torch
import torch.nn as nn

import numpy as np
#------fake data

inputs =torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
targets = torch.tensor([0, 1, 1], dtype=torch.long)

flag = 1
if flag:
weights = torch.tensor([1, 1], dtype=torch.float)

loss_f_none_w =nn.NLLLoss(weight=weights, reduction='none')
loss_f_sum = nn.NLLLoss(weight=weights, reduction='sum')
loss_f_mean = nn.NLLLoss(weight=weights, reduction='mean')

# forward
loss_none_w = loss_f_none_w(inputs, targets)
loss_sum = loss_f_sum(inputs, targets)
loss_mean = loss_f_mean(inputs, targets)

# view
print('\nweights:', weights)
print('nll loss', loss_none_w, loss_sum, loss_mean)
>>>>
weights: tensor([1., 1.])
nll loss tensor([-1., -3., -3.]) tensor(-7.) tensor(-2.3333)


1.4 nn.BCELoss

功能：二分类的交叉熵损失函数，注意事项，输入值得取值范围必须在[0, 1]
l n = − w n [ y n ∗ l o g x n + ( 1 − y n ) ∗ l o g ( 1 − x n ) ] l_{n}=-w_{n}[y_{n}*logx_{n} + (1-y_{n})*log(1-x_{n})]

    nn.BCELoss(weight=None,  # 各类别权重
size_average=None,
reduce=None,
reduction='mean' # 计算模式)


flag =1
if flag:
inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

target_bce = target

# itarget
inputs = torch.sigmoid(inputs)

weights = torch.tensor([1, 1], dtype=torch.float)

loss_f_none = nn.BCELoss(weights, reduction='none')
loss_f_sum = nn.BCELoss(weights, reduction='sum')
loss_f_mean = nn.BCELoss(weights, reduction='mean')

# forward
loss_none_w = loss_f_none(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)

print(f'\nweights: {weights}')
print(f'BCELoss ', loss_none_w, loss_sum, loss_mean)
>>>>
weights: tensor([1., 1.])
BCELoss  tensor([[0.3133, 2.1269],
[0.1269, 2.1269],
[3.0486, 0.0181],
[4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)


1.5 nn.BCEWithLogitsLoss

功能：结合sigmoid与二分类交叉熵，注意事项，网络最后不加sigmoid函数，公式如下：
l n = − w n [ y n ∗ l o g δ ( x n ) + ( 1 − y n ) ∗ l o g ( 1 − δ ( x n ) ) ] l_{n} = -w_{n}[y_{n}*log\delta(x_{n}) + (1-y_{n})*log(1-\delta(x_{n}))]

'''
nn.BCEWithLogitsLoss()
'''
flag =1
if flag:
inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

target_bce = target
weights = torch.tensor([1], dtype=torch.float)
pos_w = torch.tensor([3],dtype=torch.float)

loss_f_none = nn.BCEWithLogitsLoss(weights, reduction='none',pos_weight=pos_w)
loss_f_sum = nn.BCEWithLogitsLoss(weights, reduction='sum', pos_weight=pos_w)
loss_f_mean = nn.BCEWithLogitsLoss(weights, reduction='mean', pos_weight=pos_w)

# forward
loss_none_w = loss_f_none(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)

print(f'\npos_w: {pos_w}')
print(f'BCEWithLogitsLoss ', loss_none_w, loss_sum, loss_mean)

>>>
pos_w: tensor([3.])
BCEWithLogitsLoss  tensor([[0.9398, 2.1269],
[0.3808, 2.1269],
[3.0486, 0.0544],
[4.0181, 0.0201]]) tensor(12.7158) tensor(1.5895)
# 当pos_w = torch.tensor([1],dtype=torch.float),从输出结果中可以看出正样本的loss,乘以了3倍，模型更加关注正样本数据
>>>>pos_w: tensor([1.])
BCEWithLogitsLoss  tensor([[0.3133, 2.1269],
[0.1269, 2.1269],
[3.0486, 0.0181],
[4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)


1.6 nn.L1Loss（数据回归）

功能：计算inputs与target之差的绝对值，公式如下：
l n = ∣ x n − y n ∣ l_{n}=|x_{n}-y_{n}|

'''
nn.L1Loss(reduce='none')
'''
flag =1
if flag:
inputs = torch.ones((2, 2))
target = torch.ones((2, 2)) * 3

loss_f = nn.L1Loss(reduce='none')
loss = loss_f(inputs, target)

print(f'input:{inputs}\ntarget:{target}\nL1Loss:{loss}')
#>>>从下面的结果，可以验证与公式的计算结果是一致的

input:tensor([[1., 1.],
[1., 1.]])
target:tensor([[3., 3.],
[3., 3.]])
L1Loss:tensor([[2., 2.],
[2., 2.]])


1.7 nn.MSELoss（数据回归）

功能:计算inputs与target之差的平方，公式如下
l n = ( x n − y n ) 2 l_{n}=(x_{n}-y_{n})^2

flag =1
if flag:
inputs = torch.ones((2, 2))
target = torch.ones((2, 2)) * 3

loss_f = nn.MSELoss(reduction='none')
loss = loss_f(inputs, target)

print(f'input:{inputs}\ntarget:{target}\nMSELoss:{loss}')
>>>>
input:tensor([[1., 1.],
[1., 1.]])
target:tensor([[3., 3.],
[3., 3.]])
MSELoss:tensor([[4., 4.],
[4., 4.]])
#>>>如果 nn.MSELoss(reduction='sum')
MSELoss:16.0


1.8 nn.SmoothL1Loss（数据回归）

功能:平滑的L1Loss，先来看一下SmoothL1Loss的计算公式：
l o s s ( x , y ) = 1 n ∑ i z i loss(x, y)=\frac{1}{n}\sum_{i}z_{i}
z i = { 0.5 ( x i − y i ) 2 ,  if ∣ x i − y i ∣ < 1 ∣ x i − y i ∣ − 0.5 , otherwise z_{i}=\begin{cases} 0.5(x_{i}-y_{i})^2, \ \text{if}|x_{i}-y_{i}|<1 \\ |x_{i}-y_{i}|-0.5, \text{otherwise} \end{cases}
SmoothL1Loss如图1所示：
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vowr00oD-1603382011459)(./out_imgs/loss/l1_smooth_l1.png)]

flag = 1

if flag:
inputs = torch.linspace(-3, 3, steps=500)
target = torch.zeros_like(inputs)

loss_f = nn.SmoothL1Loss(reduction='none')
loss_smooth = loss_f(inputs, target)
loss_l1 = np.abs(inputs.numpy())
plt.plot(inputs.numpy(), loss_smooth.numpy(), label='smooth_l1_loss')
plt.plot(inputs.numpy(), loss_l1, label='l1 loss')
plt.xlabel('x_i - y_i')
plt.ylabel('loss')
plt.legend()
plt.grid()
plt.savefig('../out_imgs/loss/l1_smooth_l1.png') ##保存的即为上图


1.9 nn.PoissonNLLLoss

功能：泊松分布的负数对数似然损失函数,计算公式如下：

log_input = True l o s s ( i n p u t , t a r g e t ) = e x p ( i n p u t ) − t a r g e t ∗ i n p u t \text{log\_input = True} \\loss(input, target)=exp(input) - target * input

log_input = False l o s s ( i n p u t , t a r g e t ) = i n p u t − t a r g e t ∗ l o g ( i n p u t + e p s ) \text{log\_input = False} \\loss(input, target)= input- target * log(input+eps)

'''---------------------------PoissonNLLLoss
nn.PoissonNLLLoss(log_input=True,   # 输入是否为对数形式，决定计算公式
full=Flase,         # 计算所有loss，默认False
reduction='mean',
eps=1e-8            # 修正项，避免log(输入)为nan
)
'''
flag = 1
if flag:
inputs = torch.randn((2, 2))
target = torch.randn((2, 2))
# 有关reduction的其它计算模式在接下来的损失示例中不在一一描述
loss_f = nn.PoissonNLLLoss(log_input=True, full=False, reduction='none')
loss = loss_f(inputs, target)
print('inputs :{}\ntarget is{}\nPoissonNLLLoss :{}'.format(inputs, target, loss))

#---------------compute by hand
flag = 1
if flag:
idx = 0
# 当full=False时，采用的计算公式
loss_1 = torch.exp(inputs[idx, idx]) - target[idx, idx]* inputs[idx, idx]
print('第一个元素的loss', loss_1)
#>>>> 从输出结果可以看出，手动计算的结果与pytorch api 调用输出的结果是一致的
inputs :tensor([[ 0.0553,  0.2444],
[-0.5864,  0.1678]])
target istensor([[-1.1071, -0.4799],
[ 1.1683, -1.4043]])
PoissonNLLLoss :tensor([[1.1180, 1.3942],
[1.2415, 1.4185]])



1.10 nn.KLDivLoss

功能：计算KLD(divergence),前文介绍交叉熵也曾提到过，KLD即相对熵(计算两个分布的距离)。注意事项，需要提前将输入计算log-probabilities,如通过计算nn.logsoftmax,计算公式下：
D K L ( P ∣ ∣ Q ) = E x − p [ l o g P ( x ) Q ( x ) ] = E x − p [ l o g P ( x ) − l o g Q ( x ) ] = ∑ i = 1 N P ( x i ) ( l o g P ( x i ) − l o g ( Q ( x i ) ) ) D_{KL}(P||Q) = E_{x-p}[log\frac{P(x)}{Q(x)}]\\=E_{x-p}[logP(x)-logQ(x)]\\=\sum_{i=1}^NP(x_{i})(logP(x_{i})-log(Q(x_i)))

l n = y n ∗ ( l o g y n − x n ) l_{n} = y_{n}*(logy_{n}-x_{n})

比较上面的两个公式，一一对应来看，括号中减去的输入数据（模型的预测值）并没有像上式那样进行取对数，但是从实际理论出现KL散度是比较两个数据分布的关系，所以依据注意事项中的内容需要对输入的数据计算log-probabilities。
相关参数以及代码示例

flag =1
if flag:
# input tensor_size:(2, 3)，为了方便理解可以想像成全连接的最终输出是3个神经元,2个batch的数据
inputs = torch.tensor([[0.5, 0.3, 0.2], [0.2, 0.2, 0.5]])
inputs_log = torch.log(inputs)
target = torch.tensor([[0.9, 0.05, 0.05], [0.1, 0.7, 0.2]], dtype=torch.float)

loss_f_none = nn.KLDivLoss(reduction='none')
loss_f_mean = nn.KLDivLoss(reduction='mean')
# 根据inputs的维度的batcsize的大小为2
loss_f_batch_mean = nn.KLDivLoss(reduction='batchmean')

loss_none = loss_f_none(inputs, target)
loss_mean = loss_f_mean(inputs, target)
loss_bs_mean = loss_f_batch_mean(inputs, target)

print('loss_none:{}\nloss_mean:{}\nloss_bs_mean:{}'.format(loss_none, loss_mean, loss_bs_mean))
#-----------------compute by hand
flag = 1
if flag:
idx = 0

# 理论上需要对后一项括号中的inputs[idx, idx]取对数，但是此处输入值直接采用了[0,1]之间的数模拟概率值，同时也是直接模拟pytorch中所采用的计算公式。
loss_1 = target[idx, idx]*(torch.log(target[idx, idx])-inputs[idx, idx])
print('loss_1', loss_1)
# >>> 可以看出手动计算的第一个元素的loss与api的结果一致
loss_none:tensor([[-0.5448, -0.1648, -0.1598],
[-0.2503, -0.3897, -0.4219]])
loss_mean:-0.3218694031238556
loss_bs_mean:-0.9656082391738892
loss_1 tensor(-0.5448)


1.11 nn.MarginRankingLoss

功能：计算两个向量之间的相似度，用于排序任务。计算公式如下：
l o s s ( x , y ) = m a x ( 0 , − y ∗ ( x 1 − x 2 ) + m a r g i n ) loss(x, y) = max(0, -y * (x_{1}-x_{2}) + margin)
y y 表示取值标签，只能是1或者-1， x 1 x_{1} x 2 x_{2} 表示向量的每个元素，因此可以得到以下的结论：

• y = 1时，希望 x 1 > x 2 x_{1}>x_{2} ， 当 x 1 > x 2 x_{1}>x_{2} 时，不会产生loss
• y = -1时，希望 x 2 > x 1 x_{2}>x_{1} ， 当 x 2 > x 1 x_{2}>x_{1} 时，不会产生loss

flag = 1
if flag:
x1 = torch.tensor([[1], [2], [3]],dtype=torch.float)
x2 = torch.tensor([[2], [2], [2]], dtype=torch.float)

target = torch.tensor([1, 1, -1], dtype=torch.float)

loss_f_none = nn.MarginRankingLoss(margin=0, reduction='none')

loss =loss_f_none(x1, x2, target)
print('MarginRankingLoss', loss)
#>>>
MarginRankingLoss tensor([[1., 1., 0.],
[0., 0., 0.],
[0., 0., 1.]])
'''
1.对计算结果进行一个简单说明，输入的是2*3的一个矩阵，利用x1矩阵中每一个元素与x2中的每个元素进行比较，每个结果就是一个输出的loss,因此最终会生成一个3*3的输出loss.
2.以x1中的第一个元素为例，1将于x2中的每个元素进行比较，因为target[0]=1,根据上述公式当x1>x2是loss为0，否则为x2-x1+margin(0)。逐个元素去比较，1<2,loss[0][0] = 2-1
'''


1.12 nn.MultiLabelMarginLoss（多标签分类）

功能：多标签边界损失函数，对于多标签即一张图片对应多个类别。

l o s s ( x , y ) = ∑ i j m a x ( 0 , 1 − ( x [ y [ j ] ] − x [ i ] ) ) x . s i z e ( 0 ) loss(x, y)=\sum_{ij}\frac{max(0, 1-(x[y[j]]-x[i]))}{x.size(0)}

where i== 0 to x.size(0), j==0 to y.size(0),y[j]>=0, and i不等于y[j] for all i and j \text{where i== 0 to x.size(0), j==0 to y.size(0),y[j]>=0, and i不等于y[j] for all i and j}

flag = 1
if flag:
x = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
y = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)

loss_f = nn.MultiLabelMarginLoss(reduction='none')
loss = loss_f(x, y)
print('MultiLabelMarginLoss', loss)
# ------------compute by hand
flag = 1
if flag:
x = x[0]

item_1 = (1-(x[0]-x[1])) + (1 - (x[0]-x[2]))
item_2 = (1-(x[3]-x[1])) + (1-(x[3]-x[2]))
loss_h = (item_1 + item_2) / x.shape[0]
print('compute by hand ', loss_h)
# >>>
MultiLabelMarginLoss tensor([0.8500])
compute by hand  tensor(0.8500)


1.13 nn.SoftMarginLoss（二分类）

功能：计算二分类的logistic损失，计算公式如下：
l o s s ( x , y ) = ∑ i l o g ( 1 + e x p ( − y [ i ] ∗ x [ i ] ) ) x . n e l e m e n t loss(x, y)=\sum_{i}\frac{log(1+exp(-y[i] * x[i]))}{x.nelement}

flag = 1
if flag:

inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]])
target = torch.tensor([[-1, 1], [1, -1]], dtype=torch.float)

loss_f = nn.SoftMarginLoss(reduction='none')
loss = loss_f(inputs, target)

print('SoftMarginLoss', loss)

#-----------compute by hand
flag = 1
if flag:
idx = 0

inputs_i = inputs[idx, idx]
target_i = target[idx, idx]

loss_h = np.log(1+ np.exp(-target_i * inputs_i))

print('compute by hand', loss_h)
# >>>
SoftMarginLoss tensor([[0.8544, 0.4032],
[0.4741, 0.9741]])
compute by hand tensor(0.8544)


1.14 MultiLabelSoftMarginLoss

功能：SoftMarginLoss多标签版本，计算公式如下：
l o s s ( x , y ) = − 1 C ∗ ∑ i y [ i ] ∗ l o g ( ( 1 + e x p ( − x [ i ] ) ) − 1 ) + ( 1 − y [ i ] ) ∗ l o g ( e x p ( − x [ i ] ) 1 + e x p ( − x [ i ] ) ) loss(x, y)=-\frac{1}{C} * \sum_{i}y[i]*log((1+exp(-x[i]))^{-1})+(1-y[i])*log(\frac{exp(-x[i])}{1+exp(-x[i])})
C表示标签的数量 ， y [ i ] 为 标 签 ， x [ i ] 表 示 模 型 的 输 出 值 。 以 四 分 类 为 例 ， 此 处 的 y [ i ] 必 须 是 一 个 [ 1 , 0 , 0 , 1 ] 形 式 ， 根 据 公 式 可 以 看 出 当 y [ i ] 是 标 签 时 ， 采 用 公 式 前 面 一 项 计 算 ， 否 则 采 用 后 面 的 公 式 计 算 \text{C表示标签的数量}，y[i]为标签，x[i]表示模型的输出值。以四分类为例，此处的y[i]必须是一个[1,0,0, 1]形式，根据公式可以看出当y[i]是标签时，采用公式前面一项计算，否则采用后面的公式计算
主要参数以及代码示例：

flag = 1
if flag:
# 三分类任务
inputs = torch.tensor([[0.3, 0.7, 0.8]])
target = torch.tensor([[0, 1, 1]], dtype=torch.float)

loss_f = nn.MultiLabelSoftMarginLoss(reduction='none')
loss = loss_f(inputs, target)
print('MultiLabelSoftMarginLoss', loss)
# --------------compute by hand
flag = 1
if flag:
# MultiLabelSoftMarginLoss需要对每个神经元进行计算

# 非标签计算，计算公式后一项
i_0 = torch.log(torch.exp(-inputs[0, 0])/ (1+torch.exp(-inputs[0, 0])))

# 标签计算，采用公式第一项计算
i_1 = torch.log(1 / (1+ torch.exp(-inputs[0, 1])))
i_2 = torch.log(1 / (1+ torch.exp(-inputs[0, 2])))

loss_h = (i_0 + i_1 + i_2) / -3
print('compute by hand', loss_h)
>>>>
MultiLabelSoftMarginLoss tensor([0.5429])
compute by hand tensor(0.5429)


1.15 nn.MultiMarginLoss（多分类）

功能：计算多分类的折页损失，计算公式如下：
l o s s ( x , y ) = ∑ i m a x ( 0 , m a r g i n − x [ y ] + x [ i ] ) p x . s i z e ( 0 ) loss(x, y) = \frac{\sum_{i}max(0, margin-x[y]+x[i])^p}{x.size(0)}

where x ∈ 0 , . . . , x . s i z e ( 0 ) − 1 , y ∈ 0 , . . . , y . s i z e ( 0 ) − 1 , 0 ≤ y [ j ] ≤ x . s i z e ( 0 ) − 1 , x \in {0, ..., x.size(0)-1}, y \in {0,...,y.size(0)-1}, 0 \leq y[j] \leq x.size(0)-1, and i ≠ y [ j ] i \neq y[j] for all i and j

主要参数以及代码示例：

# nn.MultiMarginLoss(p=1,    # 可选1或2
#                 margin=1.0,
#                 weight=None,  # 各类别的loss设置权限
#                 reduction='none'  # 计算模式，可选none/sum/mean)

flag = 1
if flag:
x = torch.tensor([[0.1, 0.2, 0.7], [0.2, 0.5, 0.3]])
y = torch.tensor([1, 2], dtype=torch.long)

loss_f = nn.MultiMarginLoss(reduction='none')

loss = loss_f(x, y)
print('MultiMarginLoss', loss)

#--------compute by hand
flag = 1
if flag:
# 以输入的第一个数据为例，in:[0.1, 0.2, 0.7],相当于三分类最后的预测得分，对应的标签为1,即0.2为此类。
# 根据公式，分别使用0.2(标签值)与0.1、0.7(非标签值)做差,再相加后除以数据总数
x = x[0]

margin = 1

i_0 = margin - (x[1] -x[0])

i_2 = margin - (x[1] - x[2])

loss_h = (i_0 + i_2) / x.shape[0]
print('compute by hand',loss_h)
>>>>
MultiMarginLoss tensor([0.8000, 0.7000])
compute by hand tensor(0.8000)


1.16 TripletMarginLoss（三元组损失）

功能：计算三元组损失 ，人脸验证中常用。计算公式如下：
L ( a , p , n ) = m a x ( d ( a i , p i ) − d ( a i , n i ) + m a r g i n , 0 ) d ( x i , y i ) = ∣ ∣ x i − y i ∣ ∣ p L(a,p,n)=max({d(a_{i}, p_{i}) - d(a_{i}, n_{i}) + margin, 0}) \\d(x_{i}, y_{i}) = ||x_{i}-y_{i}||_{p}

# --------------
# nn.TripletMarginLoss(margin=1.0, # 边界值
#                     p =2.0,   # 范数的阶，默认为2
#                     eps=1e-6,
#                     swap=False,
#                     reduction='none'  # 计算模式 none/sum/mean)
flag = 1
if flag:
anchor =torch.tensor([[1.]])
pos = torch.tensor([[2.]])
neg = torch.tensor([[0.5]])

loss_f = nn.TripletMarginLoss(margin=1.0, p=1)
loss = loss_f(anchor, pos, neg)

print('TripletMarginLoss:', loss)
>>>>
TripletMarginLoss: tensor(1.5000)


1.17 TripletMarginLoss(非线性embedding和半监督学习)

功能：计算两个输入的相似性，特别注意：输入x应为两个输入之差的绝对值.计算公式如下：
l n = { x n , i f y n = 1 , m a x 0 , Δ − x n , i f y n = − 1 l_{n} = \begin{cases} x_{n}, if y_{n} = 1,\\max{0, \Delta-x_{n}}, if y_{n} = -1 \end{cases}


# nn.HingeEmbeddingLoss(margin=1.0,  # 边界值
#                 reduction='none'  # 计算模式 可为none/sum/mean/
#                 )

flag = 1
if flag:
inputs = torch.tensor([[1., 0.8, 0.5]])
target = torch.tensor([[1, 1, -1]])

loss_f = nn.HingeEmbeddingLoss(margin=1.0, reduction='none')

loss = loss_f(inputs, target)
print('HingeEmbeddingLoss:', loss)
# >>> 当标签值为1时，直接输出x,当标签为-1时，使用margin-x与0做一个max
HingeEmbeddingLoss: tensor([[1.0000, 0.8000, 0.5000]])


1.18 CosineEmbeddingLoss(embedding和半监督学习)

功能：采用余弦相似性计算两个输入的相似性，使用余弦主要考虑两个特征在方向上的差异，计算公式如下：
l o s s ( x , y ) = { 1 − c o s ( x 1 , x 2 ) , i f y = 1 m a x ( 0 , c o s ( x 1 , x 2 ) − m a r g i n ) , i f y = − 1 loss(x, y) = \begin{cases} 1-cos(x_{1}, x_{2}), \qquad if \quad y =1\\max(0, cos(x_{1}, x_{2})-margin), \qquad if \quad y =-1 \end{cases}

c o s ( θ ) = A ∗ B ∣ ∣ A ∣ ∣ ∣ ∣ B ∣ ∣ = ∑ i = 1 n A i × B i ∑ i = 1 n ( A i ) 2 × ∑ i = 1 n ( B i ) 2 cos(\theta)=\frac{A*B}{||A||||B||}=\frac{\sum_{i=1}^nA_{i}\times B_{i}}{\sqrt{\sum_{i=1}^n(A_{i})^2}\times\sqrt{\sum_{i=1}^n(B_{i})^2}}

flag = 1
if flag:

x1 = torch.tensor([[0.3, 0.5, 0.7], [0.3, 0.5, 0.7]])
x2 = torch.tensor([[0.1, 0.3, 0.5], [0.1, 0.3, 0.5]])

target = torch.tensor([[1, -1]], dtype=torch.float)

loss_f = nn.CosineEmbeddingLoss(margin=0., reduction='none')
loss = loss_f(x1, x2,target)
print('CosineEmbeddingLoss:', loss)

# --------------------compute by hand
flag = 1
if flag:

margin = 0.

def cosine(a, b):
numerator = torch.dot(a, b)
denpminator = torch.norm(a, 2)* torch.norm(b,2)
return float(numerator / denpminator)

l_1 = 1-(cosine(x1[0], x2[0]))
l_2 = max(0, cosine(x1[0], x2[0]))

print(l_1, l_2)
>>>>
CosineEmbeddingLoss: tensor([[0.0167, 0.9833]])
0.016662120819091797 0.9833378791809082


1.19 nn.CTCLoss

功能：计算CTC（Connectionist Temproal Classification）损失，解决时序类数据的分类.

    flag = 1
if flag:
T = 50   # input sequence length
C = 20   # number of classes (including blank)
N =16    # batch size
S = 30   # target sequence length of longest target in batch
S_min =10  # minimum target length, for demonstration purposes

# initialize random batch of input vector for *size = (T, N,C)
inputs = torch.randn(T,N, C).log_softmax(2).detach().requires_grad_()

# initialize random batch of target (0 = blank, 1:c = classes)
target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long)

input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)

target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long)

ctc_loss = nn.CTCLoss()
loss = ctc_loss(inputs, target, input_lengths, target_lengths)
print('ctc loss:',loss)
>>>
ctc loss: tensor(6.6770, grad_fn=<MeanBackward0>)


