dim的理解、初始化函数、cat()、empty()、Adam优化器、optim.lr_scheduler、CrossEntropyLoss和LabelSmoothing详解、KL散度

ad转化器

已于 2023-10-06 10:18:42 修改

阅读量651

点赞数 1

分类专栏： # Pytorch 文章标签： python 深度学习开发语言

于 2023-03-12 12:44:10 首次发布

本文链接：https://blog.csdn.net/gqrblnp/article/details/129473807

版权

Pytorch 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

6 使用torch.optim.lr_scheduler调整学习率

7 CrossEntropyLoss和LabelSmoothing详解

8 KL散度的两种调用方法和输入要求

1 Pytorch中dim的理解

总结：

（1） dim表示维度，每一个dim的size看其下一个dim中有几个并列的括号对，例如：

x = torch.randn(2, 3, 3)

print(x)
print(x.size())
print(x.dim())

输出：

tensor([[[-1.6943, -2.1487,  1.2332],
         [-0.2261, -0.1596,  1.5513],
         [ 2.0383, -0.6982, -2.1481]],

        [[ 0.4201, -2.7373,  0.2424],
         [-1.1152,  1.3682, -1.8322],
         [ 0.1957, -0.2920,  0.1845]]])
torch.Size([2, 3, 3])
3

这里，第一个dim的size是2，是因为第二个dim中有两个并列的括号对，即：

x = torch.randn(2, 3, 3)
print(x)

for i in x:
    print(i)
    print(i.size())

输出：

tensor([[[-1.4251, -0.8321,  1.0230],
         [ 0.2008,  0.5929, -0.7696],
         [-0.3721, -1.0837, -0.6642]],

        [[-0.5337,  0.7808,  0.4419],
         [-0.4683,  0.3847,  0.0747],
         [ 1.0156, -0.4933,  1.5340]]])


tensor(
    [
        [-1.4251, -0.8321,  1.0230],
        [ 0.2008,  0.5929, -0.7696],
        [-0.3721, -1.0837, -0.6642]
    ]
)
torch.Size([3, 3])

tensor(
    [
        [-0.5337,  0.7808,  0.4419],
        [-0.4683,  0.3847,  0.0747],
        [ 1.0156, -0.4933,  1.5340]
    ]
)
torch.Size([3, 3])

（2）使用函数给定dim时，表示的是其他维度不变，对该维度进行操作，并去除该维度或使得该维度变化，例如：

import torch

x = torch.randn(2,3)

print(x)

y = torch.argmax(x, dim=0)

print(y)
print(y.size())

输出：

tensor(
    [
        [ 0.0251, -0.3640,  0.1965],
        [ 0.6902,  0.9846,  0.2035]
    ]
)

tensor([1, 1, 1])
torch.Size([3])

去掉dim = 0，比较的就是 [ 0.0251, -0.3640, 0.1965] 和 [ 0.6902, 0.9846, 0.2035]

dim = (2, 3) -> (3)

3 torch.cat()函数

注意：按照1中dim的规律，在cat()函数中设置哪个dim，对应的dim的size会变化（增加）

6 使用torch.optim.lr_scheduler调整学习率

注意：一般是调整优化器中的学习率

7 CrossEntropyLoss和LabelSmoothing详解

总结：

（1）调用CrossEntropyLoss(x, y)，其中x为预测结果（直接的预测结果，不经过softmax层），y为实际的标签值（整数值，而非one-hot值）

（2）CrossEntropyLoss的快速计算方法：

$\operatorname{loss}(x, \text { class })=-\log \left(\frac{\exp (x[\text { class }])}{\sum_j \exp (x[j])}\right)=-x[\text { class }]+\log \left(\sum_j \exp (x[j])\right)$

其中，class为实际的标签值（整数值，而非one-hot值）

（3）LabelSmoothing的理论分析

1）上文的倒数第二段推导存在问题，因为在 $P_1$ 的分布下， $\frac{1}{n}\left(\mathbf{x}-\mathbf{u}_1\right)\left(\mathbf{x}-\mathbf{u}_1\right)^T=\Sigma_1$ ，所以可以得到 $\left(\mathbf{x}-\mathbf{u}_1\right)\left(\mathbf{x}-\mathbf{u}_1\right)^T=n\Sigma_1$ ，故原先的推导变为：

$\begin{aligned} & \mathbb{E}_{P_1}\left[\left(\mathbf{x}-\mathbf{u}_2\right)^T \Sigma_2^{-1}\left(\mathbf{x}-\mathbf{u}_2\right)-\left(\mathbf{x}-\mathbf{u}_1\right)^T \Sigma_1^{-1}\left(\mathbf{x}-\mathbf{u}_1\right)\right] \\ & =\mathbb{E}_{P_1}\left[\operatorname{Tr}\left(\Sigma_2^{-1}\left(\mathbf{x}-\mathbf{u}_1+\left(\mathbf{u}_1-\mathbf{u}_2\right)\right)\left(\mathbf{x}-\mathbf{u}_1+\left(\mathbf{u}_1-\mathbf{u}_2\right)\right)^T\right)-n\right] \\ & =\mathbb{E}_{P_1}\left[\operatorname{Tr}\left(\Sigma_2^{-1}\left(\Sigma_1+2\left(\mathbf{x}-\mathbf{u}_1\right)\left(\mathbf{u}_1-\mathbf{u}_2\right)^T+\left(\mathbf{u}_1-\mathbf{u}_2\right)\left(\mathbf{u}_1-\mathbf{u}_2\right)^T\right)\right)-n\right] \\ & =\operatorname{Tr}\left(\Sigma_2^{-1} \Sigma_1\right)+\left(\mathbf{u}_1-\mathbf{u}_2\right)^T \Sigma_2^{-1}\left(\mathbf{u}_1-\mathbf{u}_2\right)-n \end{aligned}$

2）一些结论：

$D\left(P_1 \| P_2\right)=\mathbb{E}_{P_1}\left[\log \left(P_1\right)-\log \left(P_2\right)\right]$

$\mathbb{E}_{P_1}\left[-\log \left(P_1\right)\right]=\frac{n}{2}(1+\log 2 \pi)+\frac{1}{2} \log \operatorname{det}(\Sigma_1)$

$\mathbb{E}_{P_1}\left[-\log \left(P_2\right)\right]=\frac{1}{2}\left[n\log 2 \pi+\log \operatorname{det}(\Sigma_2)+\operatorname{Tr}\left(\Sigma_2^{-1} \Sigma_1\right)+\left(\mathbf{u}_1-\mathbf{u}_2\right)^T \Sigma_2^{-1}\left(\mathbf{u}_1-\mathbf{u}_2\right)\right]$

$\begin{aligned} & D\left(P_1 \| P_2\right)=\frac{1}{2}\left[\log \left(\frac{\operatorname{det} \Sigma_2}{\operatorname{det} \Sigma_1}\right)+\operatorname{Tr}\left(\Sigma_2^{-1} \Sigma_1\right)+\left(\mathbf{u}_1-\mathbf{u}_2\right)^T \Sigma_2^{-1}\left(\mathbf{u}_1-\mathbf{u}_2\right)-n\right] \end{aligned}$

当 $P_2$ 为标准正太分布时，结果简化为：

$\begin{aligned} & D\left(P_1 \| P_2\right)=\frac{1}{2} \left[-\log \left(\operatorname{det} \Sigma_1\right)+\operatorname{Tr}\left(\Sigma_1\right)+\left\|\mathbf{u}_1\right\|^2-n\right] \end{aligned}$