1. L损失
L损失用于回归。假设tensorA和tensorB是两个维度相同的张量,tensorC是tensorA与tensorB的差。 C i C_i Ci是tensorC的第i个元素。
1.1 L1 loss
l 1 _ l o s s = 1 n ∑ i = 1 n ∣ C i ∣ . l1\_loss = \frac{1}{n} \sum\limits_{i=1}^n |C_i|. l1_loss=n1i=1∑n∣Ci∣.
import torch
import torch.nn.functional as F
if __name__ == '__main__':
a = torch.tensor([[11], [12]], dtype=float)
b = torch.tensor([[11.2], [13]], dtype=float)
loss1 = F.l1_loss(a, b, reduce=True, size_average=True)
l1_loss_f = torch.nn.L1Loss(reduce=True, size_average=True)
loss2 = l1_loss_f(a, b)
print(loss1) # 0.6
print(loss2) # 0.6
1.2 smooth L1 loss
s m o o t h _ l 1 _ l o s s = 1 n ∑ i = 1 n { 0.5 C i 2 ∣ C i ∣ < 1 ∣ C i ∣ − 0.5 ∣ C i ∣ ≥ 1 smooth\_l1\_loss = \frac{1}{n} \sum\limits_{i=1}^n \begin{cases} 0.5C_i^2\ &|C_i|<1 \\ |C_i|-0.5\ &|C_i| \geq 1 \end{cases} smooth_l1_loss=n1i=1∑n{0.5Ci2 ∣Ci∣−0.5 ∣Ci∣<1∣Ci∣≥1
import torch
import torch.nn.functional as F
if __name__ == '__main__':
a = torch.tensor([[11], [12]], dtype=float)
b = torch.tensor([[11.2], [13]], dtype=float)
loss1 = F.smooth_l1_loss(a, b, reduce=True, size_average=True)
smooth_l1_loss_f = torch.nn.SmoothL1Loss(reduce=True, size_average=True)
loss2 = smooth_l1_loss_f(a, b)
print(loss1) # 0.26
print(loss2) # 0.26
1.3 L2 loss
l 2 _ l o s s = 1 n ∑ i = 1 n C i 2 . l2\_loss = \frac{1}{n} \sum\limits_{i=1}^n C_i^2. l2_loss=n1i=1∑nCi2.
import torch
if __name__ == '__main__':
a = torch.tensor([[11], [12]], dtype=float)
b = torch.tensor([[11.2], [13]], dtype=float)
l1_loss_f = torch.nn.MSELoss(reduce=True, size_average=True)
loss = l1_loss_f(a, b)
print(loss) # 0.52
1.4 总结
上面的损失函数都有两个参数:reduce和size_average,且它们都默认为True,即不保持维度(得到一个标量)、求平均值(除n)。
L1没有解析解,求解速度慢。L2在差别(tensorA和tensorB)较大时梯度容易很大,也就容易出现梯度爆炸,但是它有解析解,计算速度较快。smooth L1是个分段函数:开始训练时差别较大使用L1防止梯度爆炸;训练一段时间后差别较小使用L2快速求得解析解。
- | L1 | smooth L1 | L2 |
---|---|---|---|
鲁棒 | 好 | - | 差 |
解 | 不稳定 | - | 稳定 |
解的个数 | 可能多个解 | - | 唯一解 |
2. Cross Entropy
交叉熵用于分类。假如网络是一个3分类任务,网络的输出
l
o
g
i
t
s
=
[
y
0
,
y
1
,
y
2
]
=
[
3
,
1
,
−
3
]
logits=[y_0,y_1,y_2]=[3,1,-3]
logits=[y0,y1,y2]=[3,1,−3]表示分为每一类的可能性大小,标签是
k
k
k。交叉熵损失的计算公式是:
l
o
s
s
=
−
l
n
e
y
k
∑
i
=
1
3
e
y
i
等
价
于
l
o
s
s
=
−
y
k
+
l
n
(
∑
i
=
1
3
e
y
i
)
.
loss = -ln{\frac{e^{y_k}}{\sum_{i=1}^{3}e^{y_i}}}\ 等价于\ loss = -y_k + ln(\sum_{i=1}^{3}e^{y_i}).
loss=−ln∑i=13eyieyk 等价于 loss=−yk+ln(i=1∑3eyi).
下面是pytorch的交叉熵损失函数:
import torch
logits = torch.Tensor([[3, 1, -3]])
labels = torch.LongTensor([0])
loss = torch.nn.functional.cross_entropy(logits, labels)
print(loss) # 0.1291
Pytorch有两种交叉熵函数,它们的区别是:
- torch.nn.CrossEntropyLoss返回值类型是:<class ‘torch.nn.modules.loss.CrossEntropyLoss’>。
- torch.nn.functional.cross_entropy返回值类型是:<class ‘torch.Tensor’>。
2.1 Focal Loss
于2018年提出的Focal Loss是对交叉熵的改进。
l
o
s
s
=
−
α
⋅
(
1
−
e
y
k
∑
i
=
1
3
e
y
i
)
γ
⋅
l
o
g
(
e
y
k
∑
i
=
1
3
e
y
i
)
=
α
⋅
(
1
−
e
−
L
c
r
o
s
s
_
e
n
t
r
o
p
y
)
γ
⋅
L
c
r
o
s
s
_
e
n
t
r
o
p
y
\begin{aligned} loss &= -\alpha \cdot (1 - {\frac{e^{y_k}}{\sum_{i=1}^{3}e^{y_i}}})^ \gamma \cdot log({\frac{e^{y_k}}{\sum_{i=1}^{3}e^{y_i}}}) \\ &=\alpha \cdot (1-e^{-L_{cross\_entropy}})^{\gamma} \cdot L_{cross\_entropy} \end{aligned}
loss=−α⋅(1−∑i=13eyieyk)γ⋅log(∑i=13eyieyk)=α⋅(1−e−Lcross_entropy)γ⋅Lcross_entropy
import torch
import numpy as np
def FocalLoss(logit, target, gamma=2, alpha=0.5):
criterion = torch.nn.CrossEntropyLoss()
CEloss = criterion(logit, target.long())
loss = alpha * ((1 - np.exp(-1.0 * CEloss)) ** gamma) * CEloss
return loss
if __name__ == '__main__':
logits = torch.Tensor([[3, 1, -3]])
labels = torch.LongTensor([0])
print(FocalLoss(logits, labels)) # tensor(0.0009)