Pytorch学习基础——损失函数
损失函数的形式化表示: L l o s s = ∑ i = 1 N b c r i t e r i o n ( y ∗ , y ) L_{loss} = \sum_{i=1}^{N_b}criterion(y^{*},y) Lloss=∑i=1Nbcriterion(y∗,y), 其中 y ∗ ∈ R B × C y^{*}\in \R^{B\times C} y∗∈RB×C为模型预测输出, y ∈ R B × C y\in \R^{B\times C} y∈RB×C为真实标签或标注, B B B为 b a t c h _ s i z e batch\_size batch_size, C C C为全连接层输出,当损失函数对每个 b a t c h batch batch计算损失时,此时有 B = 1 B=1 B=1。根据实际问题的属性,损失函数大致可分类为两类,即分类损失和回归损失。
1. 分类损失
1.1 二分类 B C E L o s s f a m i l y BCELoss \ family BCELoss family
L l o s s ( y ′ , y ) = ∑ i = 1 N b y i log ( y i ′ ) L_{loss} (y^{'},y)= \sum_{i=1}^{N_b}y_i\log(y_{i}^{'}) Lloss(y′,y)=i=1∑Nbyilog(yi′)
import torch
import torch.nn as nn
def tensor_info(tensor):
print('tensor type: {}'.format(tensor.type()))
print('tensor value: {}'.format(tensor.data))
print('tensor shape: {}'.format(tensor.shape))
criterion = nn.BCELoss()
batchsize = 2
num_class = 2
y_ = torch.randn(batchsize,num_class)
y = torch.empty(batchsize,num_class).random_(num_class)
loss = criterion(nn.Sigmoid()(y_), y)
tensor_info(y_)
tensor_info(y)
tensor_info(loss)
"""
tensor type: torch.FloatTensor
tensor value: tensor([[-0.0734, 1.1474],
[-0.1513, -0.3409]])
tensor shape: torch.Size([2, 2])
tensor type: torch.FloatTensor
tensor value: tensor([[0., 1.],
[0., 0.]])
tensor shape: torch.Size([2, 2])
tensor type: torch.FloatTensor
tensor value: 0.5225892663002014
tensor shape: torch.Size([])
"""
note:
-
B C E L o s s BCELoss BCELoss用于二分类问题,tensor的类型为 t o r c h . F l o a t T e n s o r torch.FloatTensor torch.FloatTensor,模型的输出为 s i g m o i d sigmoid sigmoid类型,即要求输出为 [ 0 , 1 ] [0,1] [0,1];
-
B C E L o s s BCELoss BCELoss较 C r o s s E n t r o p y L o s s CrossEntropyLoss CrossEntropyLoss训练更稳定;
-
当二分类类别不平衡时可以考虑 B C E W i t h L o g i t s L o s s BCEWithLogitsLoss BCEWithLogitsLoss,此时的模型输出为 l o g i t s logits logits形式,同时需要传入 w e i g h t weight weight权重参数;
w_0 = 1 w_1 = 5 class_weights = Variable(torch.FloatTensor[w_0, w_1]) criterion = nn.BCEWithLogitsLoss(class_weights) ... loss = criterion(y_, y)
1.2多分类 C r o s s E n t r o p y L o s s f a m i l y CrossEntropyLoss \ family CrossEntropyLoss family
L l o s s ( y ′ , y ) = − log ( exp ( y [ y ] ′ ) ∑ j exp [ y [ j ] ′ ] ) = − y [ y ] ′ + l o g ( ∑ j exp ( y [ j ] ′ ) L_{loss} (y^{'},y)= -\log(\dfrac{\exp(y^{'}_{[y]})}{\sum_{j}\exp[y^{'}_{[j]}]}) = -y^{'}_{[y]}+log(\sum_{j}\exp(y^{'}_{[j]}) Lloss(y′,y)=−log(∑jexp[y[j]′]exp(y[y]′))=−y[y]′+log(j∑exp(y[j]′)
criterion = nn.CrossEntropyLoss()
batchsize = 2
num_class = 3
y_ = torch.randn(batchsize,num_class)
y = torch.empty(batchsize, dtype=torch.long).random_(num_class)
loss = criterion(nn.Softmax()(y_), y)
tensor_info(y_)
tensor_info(y)
tensor_info(loss)
"""
tensor type: torch.FloatTensor
tensor value: tensor([[ 0.9964, 0.7243, -1.0832],
[ 1.2502, 0.9600, -0.1909]])
tensor shape: torch.Size([2, 3])
tensor type: torch.LongTensor
tensor value: tensor([2, 0])
tensor shape: torch.Size([2])
tensor type: torch.FloatTensor
tensor value: 1.1623895168304443
tensor shape: torch.Size([])
"""
note:
-
C r o s s E n t r o p y L o s s CrossEntropyLoss CrossEntropyLoss既可以用于二分类问题也可以用于多分类,target tensor的类型为 t o r c h . L o n g T e n s o r torch.LongTensor torch.LongTensor,维度为 y ∈ R B y\in \R^{B} y∈RB,代码自动将输出转换为 o n e _ h o t one\_hot one_hot编码,模型的输出为 s o f t m a t softmat softmat类型,即要求输出为多维 [ 0 , 1 ] [0,1] [0,1];
-
由于 B C E L o s s BCELoss BCELoss较 C r o s s E n t r o p y L o s s CrossEntropyLoss CrossEntropyLoss训练更稳定,因此二分类多使用前者,而多分类时只能使用后者;
-
多类别数据不平衡时,可以考虑多分类负对数损失函数 n n . N L L L o s s nn.NLLLoss nn.NLLLoss;
criterion = nn.NLLLoss() ... loss = criterion(nn.LogSoftmax(dim=1)(y_), y)
2.回归损失
2.1 L 1 l o s s ( M A E ) L1 \ loss (MAE) L1 loss(MAE)
L l o s s ( y ′ , y ) = ∣ y − y ′ ∣ L_{loss} (y^{'},y)= |y-y^{'}| Lloss(y′,y)=∣y−y′∣
criterion = nn.L1Loss()
batchsize = 2
data_dim = 5
y_ = torch.randn(batchsize,data_dim)
y = torch.randn(batchsize, data_dim)
loss = criterion(y_, y)
tensor_info(y_)
tensor_info(y)
tensor_info(loss)
"""
tensor type: torch.FloatTensor
tensor value: tensor([[-0.8535, -0.3021, 0.2806, 0.6997, -0.3428],
[ 1.0466, -0.7761, 1.5299, 1.8677, 0.3375]])
tensor shape: torch.Size([2, 5])
tensor type: torch.FloatTensor
tensor value: tensor([[ 0.4172, 0.3862, 1.9460, 0.3330, -0.6183],
[ 0.4837, -0.8353, 0.4653, -0.3128, 1.7366]])
tensor shape: torch.Size([2, 5])
tensor type: torch.FloatTensor
tensor value: 0.953281581401825
tensor shape: torch.Size([])
"""
note:
-
L 1 l o s s L1 \ loss L1 loss的输入和输出维度相同;
-
L 1 l o s s L1 \ loss L1 loss在零点处不平滑,相应地使用 L 1 L1 L1正则容易产生稀疏特征; L 2 l o s s L2 \ loss L2 loss对离散点比较敏感,使用梯度下降时可能导致梯度爆炸;
-
使用 n n . S m o o t h L 1 L o s s nn.SmoothL1Loss nn.SmoothL1Loss可以在$L1 \ loss 和 和 和L2 \ loss 中 折 中 , 其 表 达 式 为 : 中折中,其表达式为: 中折中,其表达式为:L_{loss}(y^{’},y) = \begin{cases} 0.5(y’ -y )^2 \ \ \ \ \ \ \ if \ |y’-y|<1 \ |y’-y|-0.5\ \ \ \ if \ otherwise \end{cases} $
criterion = nn.SmoothL1Loss()
2.2 L 2 l o s s ( ) L2 \ loss () L2 loss()
2.2 L 2 l o s s ( M S E ) L2 \ loss \ (MSE) L2 loss (MSE)
L l o s s ( y ′ , y ) = ( y ′ − y ) 2 L_{loss}(y^{'}, y) = (y^{'}-y)^2 Lloss(y′,y)=(y′−y)2
criterion = nn.MSELoss()
batchsize = 2
data_dim = 5
y_ = torch.randn(batchsize,data_dim)
y = torch.randn(batchsize, data_dim)
loss = criterion(y_, y)
tensor_info(y_)
tensor_info(y)
tensor_info(loss)
"""
tensor type: torch.FloatTensor
tensor value: tensor([[-0.9645, -1.3637, -0.3499, 0.1778, 1.4501],
[ 0.0399, -0.7981, 0.2331, -0.8327, -0.1414]])
tensor shape: torch.Size([2, 5])
tensor type: torch.FloatTensor
tensor value: tensor([[ 0.6230, 0.6931, 0.0585, -0.1514, -1.6614],
[-0.8120, -0.3299, -0.0762, -1.5901, 1.2696]])
tensor shape: torch.Size([2, 5])
tensor type: torch.FloatTensor
tensor value: 2.0312931537628174
tensor shape: torch.Size([])
"""
3.one_hot 编码
当我们想在一个含有 C r o s s E n t r o p y L o s s CrossEntropyLoss CrossEntropyLoss中增加新的损失函数时,需要对模型的输出进行 o n e _ h o t one\_hot one_hot编码,从而能与其他损失联合使用,进而设计自己的损失函数,为自定义损失函数做铺垫。
一个高效简洁的 o n e _ h o t one\_hot one_hot编码转换如下:
def tensor_info(tensor):
print('tensor type: {}'.format(tensor.type()))
print('tensor value: {}'.format(tensor.data))
print('tensor shape: {}'.format(tensor.shape))
def make_one_hot(label, classes):
label = label.unsqueeze(dim=1)
tensor_info(label)
tensor = torch.zeros(label.size()[0], classes,
label.size()[2], label.size()[3]).scatter_(1, label, 1)
tensor_info(tensor)
class_num = 2
batch_size = 2
label = torch.LongTensor(batch_size, 3, 3).random_() % class_num
tensor = make_one_hot(label, class_num)
print(tensor)
"""
tensor type: torch.LongTensor
tensor value: tensor([[[[1, 0, 0],
[0, 1, 0],
[1, 0, 1]]],
[[[0, 0, 0],
[0, 0, 0],
[0, 1, 1]]]])
tensor shape: torch.Size([2, 1, 3, 3])
tensor type: torch.FloatTensor
tensor value: tensor([[[[0., 1., 1.],
[1., 0., 1.],
[0., 1., 0.]],
[[1., 0., 0.],
[0., 1., 0.],
[1., 0., 1.]]],
[[[1., 1., 1.],
[1., 1., 1.],
[1., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 1., 1.]]]])
tensor shape: torch.Size([2, 2, 3, 3])
"""
note:
- 上述例子多用于分割图像标注的one_hot编码,一般地,标注的 G r o u n d T r u t h GroundTruth GroundTruth维度为 y ∈ R B × H × W y\in \R^{B \times H\times W} y∈RB×H×W预测的输出为 y ′ ∈ R B × C × H × W y ^{'}\in \R^{B\times C \times H \times W} y′∈RB×C×H×W,因此需要对 y y y进行 o n e _ h o t one\_hot one_hot编码;
4.自定义损失的两种方法
4.1 继承自 n n . M o d u l e nn.Module nn.Module
class MyLoss(nn.Module):
def __init__(self):
super().__init__()
def forward(self, input, target):
return torch.mean(torch.pow(input-target, 2))
criterion = MyLoss()
batchsize = 2
data_dim = 5
y_ = torch.randn(batchsize,data_dim)
y = torch.randn(batchsize, data_dim)
loss = criterion(y_, y)
tensor_info(y_)
tensor_info(y)
tensor_info(loss)
"""
tensor type: torch.FloatTensor
tensor value: tensor([[-1.0173, 0.4739, -0.7022, -1.2392, -0.9483],
[-0.8169, 1.3850, -0.5899, -0.1689, -0.6612]])
tensor shape: torch.Size([2, 5])
tensor type: torch.FloatTensor
tensor value: tensor([[ 0.6348, -0.9740, 1.2326, 0.5315, -1.0824],
[-0.8435, 0.6862, 0.3101, -0.1409, 0.8937]])
tensor shape: torch.Size([2, 5])
tensor type: torch.FloatTensor
tensor value: 1.543942928314209
tensor shape: torch.Size([])
"""
4.2 自定义损失函数
def myLoss(input, target):
return torch.mean(torch.pow(input-target, 2))
...
loss = myLoss(y_, y)
...
note:
- 继承自 n n . M o d u l e nn.Module nn.Module类的损失损失函数需要重写 f o r w a r d forward forward方法,定义相关的 t o r c h torch torch运算,设计相对灵活;使用自定义的损失函数相当于间接使用 t o r c h torch torch的损失函数,不需要维护 f o r w a r d forward forward方法,使用时相当于函数调用;
- 损失函数在进行梯度回传时必然要使用 l o s s . b a c k w a r d loss.backward loss.backward方法,上述两种自定义的损失函数都支持该方法,本质上都是间接调用的 t o r c h torch torch的损失函数;
4.3 两个常见的自定义损失函数
F
o
c
a
l
L
o
s
s
FocalLoss
FocalLoss
F
L
(
p
t
)
=
−
(
1
−
p
y
)
γ
l
o
g
(
p
t
)
FL(p_t)=-(1-p_y)^{\gamma}log(p_t)
FL(pt)=−(1−py)γlog(pt)
class FocalLoss(nn.Module):
def __init__(self, gamma=2, alpha=None, ignore_index=255, size_average=True):
super(FocalLoss, self).__init__()
self.gamma = gamma
self.size_average = size_average
self.CE_loss = nn.CrossEntropyLoss(reduce=False,
ignore_index=ignore_index, weight=alpha)
def forward(self, output, target):
logpt = self.CE_loss(output, target)
pt = torch.exp(-logpt)
loss = ((1-pt)**self.gamma) * logpt
if self.size_average:
return loss.mean()
return loss.sum()
criterion = FocalLoss()
batchsize = 2
data_dim = 5
y_ = torch.randn(batchsize,data_dim)
y = torch.empty(batchsize,dtype=torch.long).random_(data_dim)
loss = criterion(nn.Softmax()(y_), y)
tensor_info(y_)
tensor_info(y)
tensor_info(loss)
"""
tensor type: torch.FloatTensor
tensor value: tensor([[ 0.1728, 1.1785, 0.2764, -0.3511, 0.4180],
[ 0.3613, 0.7521, 1.2390, 2.0650, -0.6268]])
tensor shape: torch.Size([2, 5])
tensor type: torch.LongTensor
tensor value: tensor([2, 2])
tensor shape: torch.Size([2])
tensor type: torch.FloatTensor
tensor value: 1.0486319065093994
tensor shape: torch.Size([])
"""
D
I
C
E
L
o
s
s
DICE\ Loss
DICE Loss
L
l
o
s
s
(
y
′
,
y
)
=
1
−
2
×
∣
y
′
⋂
y
∣
∣
y
′
∣
+
∣
y
∣
L_{loss}(y', y) = 1 - 2\times\dfrac{|\ y'\bigcap y\ |}{|y'|+|y|}
Lloss(y′,y)=1−2×∣y′∣+∣y∣∣ y′⋂y ∣
class DiceLoss(nn.Module):
def __init__(self, smooth=1., ignore_index=255):
super(DiceLoss, self).__init__()
self.ignore_index = ignore_index
self.smooth = smooth
def forward(self, output, target):
if self.ignore_index not in range(target.min(), target.max()):
if (target == self.ignore_index).sum() > 0:
target[target == self.ignore_index] = target.min()
target = make_one_hot(target, classes=output.size()[1])
output = F.softmax(output, dim=1)
output_flat = output.contiguous().view(-1)
target_flat = target.contiguous().view(-1)
intersection = (output_flat * target_flat).sum()
loss = 1 - ((2. * intersection + self.smooth) /
(output_flat.sum() + target_flat.sum() + self.smooth))
return loss
criterion = DiceLoss()
batchsize = 2
data_dim = 5
y_ = torch.randn(batchsize,data_dim, 3, 3)
y = torch.empty(batchsize,3, 3, dtype=torch.long).random_(data_dim)
loss = criterion(y_, y)
tensor_info(y_)
tensor_info(y)
tensor_info(loss)
"""
tensor type: torch.LongTensor
tensor value: tensor([[[[0, 3, 1],
[0, 2, 2],
[1, 0, 1]]],
[[[2, 1, 2],
[3, 3, 2],
[1, 3, 4]]]])
tensor shape: torch.Size([2, 1, 3, 3])
tensor type: torch.FloatTensor
tensor value: tensor([[[[1., 0., 0.],
[1., 0., 0.],
[0., 1., 0.]],
[[0., 0., 1.],
[0., 0., 0.],
[1., 0., 1.]],
[[0., 0., 0.],
[0., 1., 1.],
[0., 0., 0.]],
[[0., 1., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]],
[[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 1., 0.],
[0., 0., 0.],
[1., 0., 0.]],
[[1., 0., 1.],
[0., 0., 1.],
[0., 0., 0.]],
[[0., 0., 0.],
[1., 1., 0.],
[0., 1., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 1.]]]])
tensor shape: torch.Size([2, 5, 3, 3])
tensor type: torch.FloatTensor
tensor value: tensor([[[[ 0.2699, 2.0570, 0.3527],
[ 0.1577, -0.4064, 0.1343],
[ 1.5966, 1.7491, 1.0151]],
[[-0.8926, 0.1622, 1.9066],
[ 0.5218, 0.4823, -1.1344],
[-1.0118, -0.8615, -2.1888]],
[[-0.3432, -0.3939, 0.1995],
[-0.1927, 0.1906, -0.9791],
[-0.7473, -1.4993, 0.3817]],
[[ 1.9844, -0.3772, 0.0379],
[-0.3522, 0.3117, 3.4582],
[ 0.1093, -1.1035, 1.7196]],
[[-0.3047, -0.0412, 0.4407],
[ 0.1961, 0.7687, 0.2264],
[-0.7968, -3.2159, 1.1114]]],
[[[ 0.2529, -0.2005, 1.4892],
[-0.6280, -0.5346, -0.8372],
[ 2.1497, -0.9360, 0.4647]],
[[ 0.1600, -0.4615, -0.0581],
[-0.8772, -2.2099, -0.4701],
[-0.0854, -0.6858, 1.1420]],
[[-0.5037, -1.4045, 0.3457],
[ 0.4000, 0.8670, 0.2310],
[ 0.1687, 2.2899, 1.3715]],
[[ 0.6839, 0.0109, -1.9138],
[-0.9788, -0.9355, 0.8609],
[ 1.4093, -0.5079, 0.1082]],
[[ 0.8306, -0.9631, -0.8329],
[-0.0351, -1.1003, 0.2656],
[-1.8068, -0.5764, -1.0488]]]])
tensor shape: torch.Size([2, 5, 3, 3])
tensor type: torch.LongTensor
tensor value: tensor([[[0, 3, 1],
[0, 2, 2],
[1, 0, 1]],
[[2, 1, 2],
[3, 3, 2],
[1, 3, 4]]])
tensor shape: torch.Size([2, 3, 3])
tensor type: torch.FloatTensor
tensor value: 0.8061555624008179
tensor shape: torch.Size([])
"""