Pytorch学习基础——损失函数

最新推荐文章于 2024-09-07 17:54:23 发布

qq_37172182

最新推荐文章于 2024-09-07 17:54:23 发布

阅读量329

点赞数

分类专栏： Pytorch 文章标签： pytorch loss focalLoss diceLoss

本文链接：https://blog.csdn.net/qq_37172182/article/details/108895111

版权

Pytorch 专栏收录该内容

20 篇文章 0 订阅

订阅专栏

Pytorch学习基础——损失函数

文章目录

- - Pytorch学习基础——损失函数

损失函数的形式化表示：

L_{loss} = \sum_{i=1}^{N_b}criterion(y^{*},y)

，其中

y^{*}\in \R^{B\times C}

为模型预测输出，

y\in \R^{B\times C}

为真实标签或标注,

B

为

batch\_size

C

为全连接层输出，当损失函数对每个

b a t c h

计算损失时，此时有

B = 1

。根据实际问题的属性，损失函数大致可分类为两类，即分类损失和回归损失。

1. 分类损失

1.1 二分类 $\ family$

$L_{loss} (y^{'},y)= \sum_{i=1}^{N_b}y_i\log(y_{i}^{'})$

import torch
import torch.nn as nn

def tensor_info(tensor):
    print('tensor type: {}'.format(tensor.type()))
    print('tensor value: {}'.format(tensor.data))
    print('tensor shape: {}'.format(tensor.shape))

criterion = nn.BCELoss()
batchsize = 2
num_class = 2
y_ = torch.randn(batchsize,num_class)
y = torch.empty(batchsize,num_class).random_(num_class)
loss = criterion(nn.Sigmoid()(y_), y)

tensor_info(y_)
tensor_info(y)
tensor_info(loss)
"""
tensor type: torch.FloatTensor
tensor value: tensor([[-0.0734,  1.1474],
        			  [-0.1513, -0.3409]])
        			  
tensor shape: torch.Size([2, 2])
tensor type: torch.FloatTensor
tensor value: tensor([[0., 1.],
        			  [0., 0.]])
        			  
tensor shape: torch.Size([2, 2])
tensor type: torch.FloatTensor
tensor value: 0.5225892663002014
tensor shape: torch.Size([])
"""

note:

$B C E L o s s$ 用于二分类问题，tensor的类型为 $t o r c h . F l o a t T e n s o r$ ,模型的输出为 $s i g m o i d$ 类型，即要求输出为 $[0, 1]$ ;
$B C E L o s s$ 较 $C r o s s E n t r o p y L o s s$ 训练更稳定；
当二分类类别不平衡时可以考虑 $B C E W i t h L o g i t s L o s s$ ,此时的模型输出为 $l o g i t s$ 形式，同时需要传入 $w e i g h t$ 权重参数；
```
w_0 = 1
w_1 = 5
class_weights = Variable(torch.FloatTensor[w_0, w_1])
criterion = nn.BCEWithLogitsLoss(class_weights)
...
loss = criterion(y_, y)
```

1.2多分类 $\ family$

$L_{loss} (y^{'},y)= -\log(\dfrac{\exp(y^{'}_{[y]})}{\sum_{j}\exp[y^{'}_{[j]}]}) = -y^{'}_{[y]}+log(\sum_{j}\exp(y^{'}_{[j]})$

criterion = nn.CrossEntropyLoss()
batchsize = 2
num_class = 3
y_ = torch.randn(batchsize,num_class)
y = torch.empty(batchsize, dtype=torch.long).random_(num_class)
loss = criterion(nn.Softmax()(y_), y)

tensor_info(y_)
tensor_info(y)
tensor_info(loss)
"""
tensor type: torch.FloatTensor
tensor value: tensor([[ 0.9964,  0.7243, -1.0832],
        			  [ 1.2502,  0.9600, -0.1909]])
tensor shape: torch.Size([2, 3])

tensor type: torch.LongTensor
tensor value: tensor([2, 0])
tensor shape: torch.Size([2])

tensor type: torch.FloatTensor
tensor value: 1.1623895168304443
tensor shape: torch.Size([])
"""

note:

$C r o s s E n t r o p y L o s s$ 既可以用于二分类问题也可以用于多分类，target tensor的类型为 $t o r c h . L o n g T e n s o r$ ，维度为 $y\in \R^{B}$ ,代码自动将输出转换为 $one\_hot$ 编码,模型的输出为 $s o f t m a t$ 类型，即要求输出为多维 $[0, 1]$ ;
由于 $B C E L o s s$ 较 $C r o s s E n t r o p y L o s s$ 训练更稳定，因此二分类多使用前者，而多分类时只能使用后者；
多类别数据不平衡时，可以考虑多分类负对数损失函数 $n n . N L L L o s s$ ；
```
criterion = nn.NLLLoss()
...
loss = criterion(nn.LogSoftmax(dim=1)(y_), y)
```

2.回归损失

2.1 $\ loss (MAE)$

$L_{loss} (y^{'},y)= |y-y^{'}|$

criterion = nn.L1Loss()
batchsize = 2
data_dim = 5
y_ = torch.randn(batchsize,data_dim)
y = torch.randn(batchsize, data_dim)
loss = criterion(y_, y)

tensor_info(y_)
tensor_info(y)
tensor_info(loss)
"""
tensor type: torch.FloatTensor
tensor value: tensor([[-0.8535, -0.3021,  0.2806,  0.6997, -0.3428],
        			  [ 1.0466, -0.7761,  1.5299,  1.8677,  0.3375]])
tensor shape: torch.Size([2, 5])

tensor type: torch.FloatTensor
tensor value: tensor([[ 0.4172,  0.3862,  1.9460,  0.3330, -0.6183],
        			  [ 0.4837, -0.8353,  0.4653, -0.3128,  1.7366]])
tensor shape: torch.Size([2, 5])

tensor type: torch.FloatTensor
tensor value: 0.953281581401825
tensor shape: torch.Size([])
"""

note:

$\ loss$ 的输入和输出维度相同；
$\ loss$ 在零点处不平滑，相应地使用 $L 1$ 正则容易产生稀疏特征； $\ loss$ 对离散点比较敏感，使用梯度下降时可能导致梯度爆炸；
使用 $n n . S m o o t h L 1 L o s s$ 可以在$L1 \ loss $和$ L2 \ loss $中折中，其表达式为：$ L_{loss}(y^{’},y) = \begin{cases} 0.5(y’ -y )^2 \ \ \ \ \ \ \ if \ |y’-y|<1 \ |y’-y|-0.5\ \ \ \ if \ otherwise \end{cases} $
```
criterion = nn.SmoothL1Loss()
```

2.2 $\ loss ()$

2.2 $\ loss \ (MSE)$

$L_{loss}(y^{'}, y) = (y^{'}-y)^2$

criterion = nn.MSELoss()
batchsize = 2
data_dim = 5
y_ = torch.randn(batchsize,data_dim)
y = torch.randn(batchsize, data_dim)
loss = criterion(y_, y)

tensor_info(y_)
tensor_info(y)
tensor_info(loss)

"""
tensor type: torch.FloatTensor
tensor value: tensor([[-0.9645, -1.3637, -0.3499,  0.1778,  1.4501],
        			  [ 0.0399, -0.7981,  0.2331, -0.8327, -0.1414]])
tensor shape: torch.Size([2, 5])

tensor type: torch.FloatTensor
tensor value: tensor([[ 0.6230,  0.6931,  0.0585, -0.1514, -1.6614],
        			  [-0.8120, -0.3299, -0.0762, -1.5901,  1.2696]])
tensor shape: torch.Size([2, 5])

tensor type: torch.FloatTensor
tensor value: 2.0312931537628174
tensor shape: torch.Size([])
"""

3.one_hot 编码

当我们想在一个含有 $C r o s s E n t r o p y L o s s$ 中增加新的损失函数时，需要对模型的输出进行 $one\_hot$ 编码，从而能与其他损失联合使用，进而设计自己的损失函数，为自定义损失函数做铺垫。

一个高效简洁的 $one\_hot$ 编码转换如下：

def tensor_info(tensor):
    print('tensor type: {}'.format(tensor.type()))
    print('tensor value: {}'.format(tensor.data))
    print('tensor shape: {}'.format(tensor.shape))

def make_one_hot(label, classes):
    label = label.unsqueeze(dim=1)
    tensor_info(label)
    tensor = torch.zeros(label.size()[0], classes, 
                         label.size()[2], label.size()[3]).scatter_(1, label, 1)
    tensor_info(tensor)

class_num = 2
batch_size = 2
label = torch.LongTensor(batch_size, 3, 3).random_() % class_num
tensor = make_one_hot(label, class_num)
print(tensor)

"""
tensor type: torch.LongTensor
tensor value: tensor([[[[1, 0, 0],
          				[0, 1, 0],
          				[1, 0, 1]]],

        			[[[0, 0, 0],
          				[0, 0, 0],
          				[0, 1, 1]]]])
tensor shape: torch.Size([2, 1, 3, 3])

tensor type: torch.FloatTensor
tensor value: tensor([[[[0., 1., 1.],
                          [1., 0., 1.],
                          [0., 1., 0.]],
                         [[1., 0., 0.],
                          [0., 1., 0.],
                          [1., 0., 1.]]],

                        [[[1., 1., 1.],
                          [1., 1., 1.],
                          [1., 0., 0.]],
                         [[0., 0., 0.],
                          [0., 0., 0.],
                          [0., 1., 1.]]]])
tensor shape: torch.Size([2, 2, 3, 3])
"""

note:

上述例子多用于分割图像标注的one_hot编码，一般地，标注的 $G r o u n d T r u t h$ 维度为 $y\in \R^{B \times H\times W}$ 预测的输出为 $^{'}\in \R^{B\times C \times H \times W}$ ，因此需要对 $y$ 进行 $one\_hot$ 编码;

4.自定义损失的两种方法

4.1 继承自 $n n . M o d u l e$

class MyLoss(nn.Module):
    def __init__(self):
        super().__init__()
    
    def forward(self, input, target):
        return torch.mean(torch.pow(input-target, 2))

criterion = MyLoss()

batchsize = 2
data_dim = 5
y_ = torch.randn(batchsize,data_dim)
y = torch.randn(batchsize, data_dim)
loss = criterion(y_, y)

tensor_info(y_)
tensor_info(y)
tensor_info(loss)

"""
tensor type: torch.FloatTensor
tensor value: tensor([[-1.0173,  0.4739, -0.7022, -1.2392, -0.9483],
        			  [-0.8169,  1.3850, -0.5899, -0.1689, -0.6612]])
tensor shape: torch.Size([2, 5])

tensor type: torch.FloatTensor
tensor value: tensor([[ 0.6348, -0.9740,  1.2326,  0.5315, -1.0824],
        		      [-0.8435,  0.6862,  0.3101, -0.1409,  0.8937]])
tensor shape: torch.Size([2, 5])

tensor type: torch.FloatTensor
tensor value: 1.543942928314209
tensor shape: torch.Size([])
"""

4.2 自定义损失函数

def myLoss(input, target):
    return torch.mean(torch.pow(input-target, 2))
...
loss = myLoss(y_, y)
...

note:

继承自 $n n . M o d u l e$ 类的损失损失函数需要重写 $f o r w a r d$ 方法，定义相关的 $t o r c h$ 运算，设计相对灵活；使用自定义的损失函数相当于间接使用 $t o r c h$ 的损失函数，不需要维护 $f o r w a r d$ 方法，使用时相当于函数调用；
损失函数在进行梯度回传时必然要使用 $l o s s . b a c k w a r d$ 方法，上述两种自定义的损失函数都支持该方法，本质上都是间接调用的 $t o r c h$ 的损失函数；

4.3 两个常见的自定义损失函数

$F o c a l L o s s$
$FL(p_t)=-(1-p_y)^{\gamma}log(p_t)$

class FocalLoss(nn.Module):
    def __init__(self, gamma=2, alpha=None, ignore_index=255, size_average=True):
        super(FocalLoss, self).__init__()
        self.gamma = gamma
        self.size_average = size_average
        self.CE_loss = nn.CrossEntropyLoss(reduce=False, 
                                           ignore_index=ignore_index, weight=alpha)

    def forward(self, output, target):
        logpt = self.CE_loss(output, target)
        pt = torch.exp(-logpt)
        loss = ((1-pt)**self.gamma) * logpt
        if self.size_average:
            return loss.mean()
        return loss.sum()

criterion = FocalLoss()

batchsize = 2
data_dim = 5
y_ = torch.randn(batchsize,data_dim)
y = torch.empty(batchsize,dtype=torch.long).random_(data_dim)
loss = criterion(nn.Softmax()(y_), y)

tensor_info(y_)
tensor_info(y)
tensor_info(loss)

"""
tensor type: torch.FloatTensor
tensor value: tensor([[ 0.1728,  1.1785,  0.2764, -0.3511,  0.4180],
        			  [ 0.3613,  0.7521,  1.2390,  2.0650, -0.6268]])
tensor shape: torch.Size([2, 5])

tensor type: torch.LongTensor
tensor value: tensor([2, 2])
tensor shape: torch.Size([2])

tensor type: torch.FloatTensor
tensor value: 1.0486319065093994
tensor shape: torch.Size([])
"""

$DICE\ Loss$
$L_{loss}(y', y) = 1 - 2\times\dfrac{|\ y'\bigcap y\ |}{|y'|+|y|}$

class DiceLoss(nn.Module):
    def __init__(self, smooth=1., ignore_index=255):
        super(DiceLoss, self).__init__()
        self.ignore_index = ignore_index
        self.smooth = smooth

    def forward(self, output, target):
        if self.ignore_index not in range(target.min(), target.max()):
            if (target == self.ignore_index).sum() > 0:
                target[target == self.ignore_index] = target.min()
        target = make_one_hot(target, classes=output.size()[1])
        output = F.softmax(output, dim=1)
        output_flat = output.contiguous().view(-1)
        target_flat = target.contiguous().view(-1)
        intersection = (output_flat * target_flat).sum()
        loss = 1 - ((2. * intersection + self.smooth) /
                    (output_flat.sum() + target_flat.sum() + self.smooth))
        return loss

criterion = DiceLoss()
batchsize = 2
data_dim = 5
y_ = torch.randn(batchsize,data_dim, 3, 3)
y = torch.empty(batchsize,3, 3, dtype=torch.long).random_(data_dim)
loss = criterion(y_, y)

tensor_info(y_)
tensor_info(y)
tensor_info(loss)

"""
tensor type: torch.LongTensor
tensor value: tensor([[[[0, 3, 1],
          [0, 2, 2],
          [1, 0, 1]]],
        [[[2, 1, 2],
          [3, 3, 2],
          [1, 3, 4]]]])
tensor shape: torch.Size([2, 1, 3, 3])

tensor type: torch.FloatTensor
tensor value: tensor([[[[1., 0., 0.],
                      [1., 0., 0.],
                      [0., 1., 0.]],
                     [[0., 0., 1.],
                      [0., 0., 0.],
                      [1., 0., 1.]],
                     [[0., 0., 0.],
                      [0., 1., 1.],
                      [0., 0., 0.]],
                     [[0., 1., 0.],
                      [0., 0., 0.],
                      [0., 0., 0.]],
                     [[0., 0., 0.],
                      [0., 0., 0.],
                      [0., 0., 0.]]],
                    [[[0., 0., 0.],
                      [0., 0., 0.],
                      [0., 0., 0.]],
                     [[0., 1., 0.],
                      [0., 0., 0.],
                      [1., 0., 0.]],
                     [[1., 0., 1.],
                      [0., 0., 1.],
                      [0., 0., 0.]],
                     [[0., 0., 0.],
                      [1., 1., 0.],
                      [0., 1., 0.]],
                     [[0., 0., 0.],
                      [0., 0., 0.],
                      [0., 0., 1.]]]])
tensor shape: torch.Size([2, 5, 3, 3])

tensor type: torch.FloatTensor
tensor value: tensor([[[[ 0.2699,  2.0570,  0.3527],
                      [ 0.1577, -0.4064,  0.1343],
                      [ 1.5966,  1.7491,  1.0151]],

                     [[-0.8926,  0.1622,  1.9066],
                      [ 0.5218,  0.4823, -1.1344],
                      [-1.0118, -0.8615, -2.1888]],

                     [[-0.3432, -0.3939,  0.1995],
                      [-0.1927,  0.1906, -0.9791],
                      [-0.7473, -1.4993,  0.3817]],

                     [[ 1.9844, -0.3772,  0.0379],
                      [-0.3522,  0.3117,  3.4582],
                      [ 0.1093, -1.1035,  1.7196]],

                     [[-0.3047, -0.0412,  0.4407],
                      [ 0.1961,  0.7687,  0.2264],
                      [-0.7968, -3.2159,  1.1114]]],


                    [[[ 0.2529, -0.2005,  1.4892],
                      [-0.6280, -0.5346, -0.8372],
                      [ 2.1497, -0.9360,  0.4647]],

                     [[ 0.1600, -0.4615, -0.0581],
                      [-0.8772, -2.2099, -0.4701],
                      [-0.0854, -0.6858,  1.1420]],

                     [[-0.5037, -1.4045,  0.3457],
                      [ 0.4000,  0.8670,  0.2310],
                      [ 0.1687,  2.2899,  1.3715]],

                     [[ 0.6839,  0.0109, -1.9138],
                      [-0.9788, -0.9355,  0.8609],
                      [ 1.4093, -0.5079,  0.1082]],

                     [[ 0.8306, -0.9631, -0.8329],
                      [-0.0351, -1.1003,  0.2656],
                      [-1.8068, -0.5764, -1.0488]]]])
tensor shape: torch.Size([2, 5, 3, 3])

tensor type: torch.LongTensor
tensor value: tensor([[[0, 3, 1],
         [0, 2, 2],
         [1, 0, 1]],

        [[2, 1, 2],
         [3, 3, 2],
         [1, 3, 4]]])
tensor shape: torch.Size([2, 3, 3])

tensor type: torch.FloatTensor
tensor value: 0.8061555624008179
tensor shape: torch.Size([])

"""