pytorch笔记（一）

最新推荐文章于 2024-07-28 09:26:52 发布

躺鸡小能手

最新推荐文章于 2024-07-28 09:26:52 发布

阅读量5.2k

点赞数 1

分类专栏：笔记文章标签： pytorch笔记

本文链接：https://blog.csdn.net/wu_x_j_/article/details/84192257

版权

笔记专栏收录该内容

19 篇文章 0 订阅

订阅专栏

1.Variable是神经网络中特有的一个概念，提供了自动求导的功能，在numpy中没有该功能，存在与torch.autograd.Variable().
通过data可以取出Variable中的tensor数值，grad_fn表示的是得到这个Variable的操作，比如是通过加减还是乘除得来的，最后grad是这个Variable的反向传播梯度。

对于标量的求导
x = Variable(torch.Tensor([1]),requires_grad = True)
w = Variable(torch.Tensor([2]),requires_grad = True)
b = Variable(torch.Tensor([3]),requires_grad = True)

Y = w*x+b

Y.backward()   #即所谓的自动求导，不需要指定哪个函数对哪个函数求导，
               # 直接通过这行代码就能对所有的需要梯度的变量进行求导，然后可以直接通过  变量.grad得到需要的梯度

print(x.grad)   #得到tensor([2.])
print(x.data)   #得到tensor([1.])

对于矩阵的求导
x = torch.randn(3)
x = Variable(x,requires_grad = True)
y = x*2
#y.backward()  #y为向量，不能使用该语句，必须传入相应的参数
y.backward(torch.FloatTensor([1,0.1,0.01]))   #代表得到的梯度分别乘1,0.1,0.01
print(x)    #输出：tensor([ 1.0583,  2.0384,  0.4226])
print(x.grad)   #输出： tensor([ 2.0000,  0.2000,  0.0200])

2.优化过程中各个步骤的解释

criterion = nn.CrossEntropyLoss()
loss = criterion(output,target)
optimizer = torch.optim.SGD(mynet.parameters(),lr = 0.01,momentum = 0.9)
optimizer.zero_grad()       #梯度清零
loss.backward()             #求得每个参数的梯度
optimizer.step()            #参数更新

3.模型的保存与加载
model为模型的名字
模型的保存和加载有两种方式：
(1) 仅仅保存和加载模型参数：相对来说较为灵活
torch.save(the_model.state_dict(), PATH)
the_model.load_state_dict(torch.load(PATH))
(2) 保存和加载整个模型
torch.save(the_model, PATH)
the_model = torch.load(PATH)

4.每次做反向传播之前都要归零梯度，不然梯度会累积在一起，造成结果不收敛，要注意：loss是一个Variable，所以要通过loss.data取出其中的Tensor，再通过loss.data[0]得到一个int或者float类型的数据，这样我们才能打印出相应的数据。

5.Variable volatile=True 代表不计算梯度，默认为False，通过该变量产生的子变量的volatile数值相同6.detach 代表隔断梯度的传播？？比如：

# y=A(x), z=B(y) 求B中参数的梯度，不求A中参数的梯度
# 第一种方法
y = A(x)
z = B(y.detach())
z.backward()

# 第二种方法
y = A(x)
y.detach_()
z = B(y)
z.backward()

7.反向传播计算梯度的时候，self.loss.backward(retain_graph=retain_graph)的作用：

class ContentLoss(nn.Module):
    def __init__(self, target, weight):
        super(ContentLoss, self).__init__()
        self.target = target.detach() * weight
        # 因为这里只是需要target这个数值，这个数值是一种状态，不计入计算树中。
        # 这里单纯将其当做常量对待，因此用了detach则在backward中计算梯度时不对target之前所在的计算图存在任何影响。
        self.weight = weight
        self.criterion = nn.MSELoss()
    def forward(self, input):
        self.loss = self.criterion(input * self.weight, self.target)
        self.output = input
        return self.output
    def backward(self, retain_graph=True):
        self.loss.backward(retain_graph=retain_graph)
        return self.loss

看到上面的代码，我们在内容损失层中定义了一个backward()反向反馈函数。这个函数在整个神经网络在反向循环的时候会执行loss的backward从而实现对loss的更新。

但是在这个代码中，我们设置了retain_graph=True，这个参数的作用是什么，官方定义为：

retain_graph (bool, optional) – If False, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way. Defaults to the value of create_graph.

大意是如果设置为False，计算图中的中间变量在计算完后就会被释放。但是在平时的使用中这个参数默认都为False从而提高效率，和creat_graph的值一样。
其实retain_graph这个参数在平常中我们是用不到的，但是在特殊的情况下我们会用到它：

假设一个我们有一个输入x，y = x **2, z = y*4，然后我们有两个输出，一个output_1 = z.mean()，另一个output_2 = z.sum()。然后我们对两个output执行backward。

In[3]: import torch
In[5]: x = torch.randn((1,4),dtype=torch.float32,requires_grad=True)
In[6]: y = x ** 2
In[7]: z = y * 4
In[8]: output1 = z.mean()
In[9]: output2 = z.sum()
In[10]: output1.backward()    # 这个代码执行正常，但是执行完中间变量都free了，所以下一个出现了问题
In[11]: output2.backward()    # 这时会引发错误
Traceback (most recent call last):
  File "/home/prototype/anaconda3/envs/pytorch-env/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-11-32d5139229de>", line 1, in <module>
    output2.backward()
  File "/home/prototype/anaconda3/envs/pytorch-env/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/prototype/anaconda3/envs/pytorch-env/lib/python3.6/site-packages/torch/autograd/__init__.py", line 89, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

如果我们这样写：

In[3]: import torch
  ...: from torch.autograd import Variable
  ...: x = torch.randn((1,4),dtype=torch.float32,requires_grad=True)
  ...: y = x ** 2
  ...: z = y * 4
  ...: output1 = z.mean()
  ...: output2 = z.sum()
  ...: output1.backward(retain_graph=True)   # 这里参数表明保留backward后的中间参数。
  ...: output2.backward()

有两个输出的时候就需要用到这个参数，这就和之前提到的风格迁移中Content Loss层为什么使用这个参数有了联系，因为在风格迁移中不只有Content Loss层还有Style Loss层，两个层都公用一个神经网络的参数但是有两个loss的输出，因此需要retain_graph参数为True去保留中间参数从而两个loss的backward()不会相互影响。

# 假如你有两个Loss，先执行第一个的backward，再执行第二个backward
loss1.backward(retain_graph=True)
loss2.backward() # 执行完这个后，所有中间变量都会被释放，以便下一次的循环
optimizer.step() # 更新参数

这样就比较容易理解了。retain_variables=True，这个参数默认是False，也就是反向传播之后这个计算图的内存会被释放，这样就没办法进行第二次反向传播了，所以我们需要设置为True，因为这里我们需要进行两次反向传播

8.风格迁移，在计算Gram矩阵的时候，第一种理解方法为厚度，第二种理解方法为与转置矩阵的乘积。
第一种方法：对于风格图，在某个卷积层中得到一个C X M X N的特征图，先取出第 i 层和第 j 层的参数，对应相乘，然后累加，得到Gram矩阵中坐标为i,j的点，如1*1,1*2,1*3.......，在理解的时候，可把每一层拉伸为1维向量（1，M * N），矩阵size 为 C X M*N ，与转置相乘，就能得到C X C 的Gram风格矩阵。也对应于第二种方法。

def __init__(self):
        super(Net, self).__init__()
        self.conv1 = torch.nn.Sequential(
            torch.nn.Conv2d(3, 32, 3, 1, 1),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2))
        self.conv2 = torch.nn.Sequential(
            torch.nn.Conv2d(32, 64, 3, 1, 1),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2)
        )
        self.conv3 = torch.nn.Sequential(
            torch.nn.Conv2d(64, 64, 3, 1, 1),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2)
        )
        self.dense = torch.nn.Sequential(
            torch.nn.Linear(64 * 3 * 3, 128),
            torch.nn.ReLU(),
            torch.nn.Linear(128, 10)
        )
64*3*3中的3*3是代表卷积之后的图像大小，在本例中，由于kerals_size、padding、stride的值比较特殊，因此，一旦确定了3*3，根据公式H = （W   - F + 2P）/S +1反向推断，图片大小不会改变，所以输入的图像也只能是3*3

10.关于torch.max()
torch.max(a,0) 返回每一列中最大值的那个元素，且返回索引（返回最大元素在这一列的行索引）
torch.max(a,1) 返回每一行中最大值的那个元素，且返回索引（返回最大元素在这一行的列索引）
c = torch.max(a,1)[1] 返回最大元素在这一行的列索引，在MNIST数据集中，由于labels是从0开始的，所以结果中返回的列索引即为预测值(pred)
例：

import torch
a = torch.randn(3,3)
print(a)

b = torch.max(a,1)
print(b)

c = torch.max(a,1)[1]
print(c)

outputs:
  tensor([[ 0.3856, -0.7356,  0.0876],
        [ 0.9246, -0.9074,  0.4152],
        [ 0.7600,  1.1790, -0.8748]])
  (tensor([ 0.3856,  0.9246,  1.1790]), tensor([ 0,  0,  1]))
  tensor([ 0,  0,  1])

http://172.21.144.46:8892/tree/%E6%A3%80%E6%B5%8B%E7%A8%8B%E5%BA%8F
11.pytorch0.3与0.4在计算累计损失时有所不同
以广泛使用的模式total_loss += loss.data[0]为例。Python0.4.0之前，loss是一个封装了(1,)张量的Variable，但Python0.4.0的loss现在是一个零维的标量。对标量进行索引是没有意义的（似乎会报 invalid index to scalar variable 的错误）。使用loss.item()可以从标量中获取Python数字。所以改为：
``` total_loss += loss.item()```
如果在累加损失时未将其转换为Python数字，则可能出现程序内存使用量增加的情况。这是因为上面表达式的右侧原本是一个Python浮点数，而它现在是一个零维张量。因此，总损失累加了张量和它们的梯度历史，这可能会产生很大的autograd 图，耗费内存和计算资源。

12.[PyTorch学习系列(一)——加载数据并生成batch数据 - CSDN博客](https://blog.csdn.net/victoriaw/article/details/72356453)

**文章重点：**在定义torch.utils.data.Dataset的子类时，必须重载的两个函数是__len__和__getitem__。__len__返回数据集的大小，__getitem__实现数据集的下标索引，返回对应的图像和标记（不一定非得返回图像和标记，返回元组的长度可以是任意长，这由网络需要的数据决定）。

在创建DataLoader时会判断__getitem__返回值的数据类型，然后用不同的if/else分支把数据转换成tensor，所以，_getitem_返回值的数据类型可选择范围很多，一种可以选择的数据类型是：图像为numpy.array，标记为int数据类型。

* * *

13. [PyTorch 学习笔记（四）：自定义 Dataset 和输入流 - PyTorch Tutorial](http://www.pytorchtutorial.com/pytorch-note4-input-data-pipeline/)

class MNIST(data.Dataset):

    def __init__(self, root, train=True, transform=None, target_transform=None, download=False):

        self.root = root

        self.transform = transform

        self.target_transform = target_transform

        self.train = train  # training set or test set

        if download:

            self.download()

        if not self._check_exists():

            raise RuntimeError('Dataset not found.' +

                               ' You can use download=True to download it')

        if self.train:

            self.train_data, self.train_labels = torch.load(

                os.path.join(root, self.processed_folder, self.training_file))

        else:

            self.test_data, self.test_labels = torch.load(os.path.join(root, self.processed_folder, self.test_file))

    def __getitem__(self, index):

        if self.train:

            img, target = self.train_data[index], self.train_labels[index]

        else:https://blog.csdn.net/u010248552/article/details/78476934?locationNum=8&fps=1

            img, target = self.test_data[index], self.test_labels[index]

        # doing this so that it is consistent with all other datasets

        # to return a PIL Image

        img = Image.fromarray(img.numpy(), mode='L')

        if self.transform is not None:

            img = self.transform(img)

        if self.target_transform is not None:

            target = self.target_transform(target)

        return img, target

    def __len__(self):

        if self.train:

            return 60000

        else:

            return 10000

14.用torch.autograd.Variable将Tensor封装成模型真正可以用的Variable数据类型。
为什么要封装成Variable呢？在pytorch中，torch.tensor和torch.autograd.Variable是两种比较重要的数据结构，Variable可以看成是tensor的一种包装，其不仅包含了tensor的内容，还包含了梯度等信息，因此在神经网络中常常用Variable数据结构。那么怎么从一个Variable类型中取出tensor呢？也很简单，比如下面封装后的inputs是一个Variable，那么inputs.data就是对应的tensor。

15.生成对抗网络：在训练判别器的时候，希望假的数据尽可能的输出0，训练生成器的时候，希望假的数据尽可能的输出1，定义判别器时，最后需通过一个Sigmoid函数，将结果映射为概率值，定义生成器时，最后需通过一个tanh函数，将生成的像素值映射到-1～1。

16.https://blog.csdn.net/u012609509/article/details/81264687
当加载MNIST、CIFAR10等自带的数据集时，使用torchvision.datasets.MNIST(....)，然后使用torch.utils.data.DataLoader读取数据
当使用自定义的数据集时，应该先继承torch.utils.data.Dataset类，并且重写__len()__和—__getitem()方法__

torch.utils.data.Dataset 是一个表示数据集的抽象类.
你自己的数据集一般应该继承``Dataset``, 并且重写下面的方法:
1. __len__ 使用``len(dataset)`` 可以返回数据集的大小
2. __getitem__ 支持索引, 以便于使用 dataset[i] 可以获取第i个样本(0索引)

torch.utils.data中的DataLoader提供为Dataset类对象提供了:
1.批量读取数据
2.打乱数据顺序
3.使用multiprocessing并行加载数据

DataLoader中的一个参数collate_fn：可以使用它来指定如何精确地读取一批样本，
merges a list of samples to form a mini-batch.
然而，默认情况下collate_fn在大部分情况下都表现很好

注意：当使用torchvision中自带的数据集时，download可一直设置为TURE，但root需要设置为压缩文件所在的根目录，不需要具体到训练集，当压缩文件已经存在时，不会从网上再次下载，如果不存在，则会下载到root所表示的根目录。