PyTorch学习笔记（一）

最新推荐文章于 2022-04-16 10:48:52 发布

zhazhahuio

最新推荐文章于 2022-04-16 10:48:52 发布

阅读量270

点赞数

文章标签： pytorch

原文链接：https://blog.csdn.net/u012436149/article/details/54627597

版权

pytorch是一个动态的建图的工具。不像Tensorflow那样，先建图，然后通过feed和run重复执行建好的图。相对来说，pytorch具有更好的灵活性。

编写一个深度网络需要关注的地方是：

网络的参数应该由什么对象保存
如何构建网络
如何计算梯度和更新参数

数据放在什么对象中

pytorch新版本（0.4之后）中只有一种变量类型，Tensor。

Tensor：就像ndarray一样,一维Tensor叫Vector，二维Tensor叫Matrix，三维及以上称为Tensor
有两种Tensor，一个是requires_grad=False的，即：不需要计算其梯度的参数。一个是requires_grad=True，即：需要计算梯度的参数

import torchx  = torch.tensor([2,3,4], dtype=torch.float) # 创建一个Tensor，值为[2.,3.,4.]，类型为 float# 创建一个需要求 梯度的 tensor。x2 = torch.tensor([2,3,4], dtype=torch.float, requires_grad=True)
  
  
  
  1
2
3
4
5

x.size()
  
  
  
  1

torch.Size([3])
  
  
  
  1

tensor的一些操作

a.add_(b) # 所有带 _ 的operation，都会更改调用对象的值，#例如 a=1;b=2; a.add_(b); a就是3了，没有 _ 的operation就没有这种效果，只会返回运算结果torch.cuda.is_available()
  
  
  
  1
2
3

True
  
  
  
  1

自动求导

使用pytorch的自动求导 $\partial y/\partial x$ $\partial y / \partial x$ 功能需要满足两个条件：

y.requires_grad==True且 x.requires_grad==True
x 到 y的计算图不能在 torch.no_grad() 的 with block下

两个条件都很容易满足，只要将 x.requires_grad=True ，那么根据pytorch的运算规则（一op的两个输入进行运算，只要有一个的 requires_grad=True，那么输出结果Tensor的requires_grad一定为True）得到的 y 的 requires_grad为True

import torchx = torch.tensor([1,1,1,1,1], dtype=torch.float, requires_grad=True)y = x * 2grads = torch.FloatTensor([1,2,3,4,5])y.backward(grads)#如果y是scalar的话，那么直接y.backward()，然后通过x.grad方式，就可以得到var的梯度x.grad           #如果y不是scalar，那么只能通过传参的方式给x指定梯度
  
  
  
  1
2
3
4
5
6

Variable containing:  2  4  6  8 10[torch.FloatTensor of size 5]
  
  
  
  1
2
3
4
5
6
7

neural networks

使用torch.nn包中的工具来构建神经网络
构建一个神经网络需要以下几步：

定义神经网络的权重,搭建网络结构
遍历整个数据集进行训练
- 将数据输入神经网络
- 计算loss
- 计算网络权重的梯度
- 更新网络权重
  - weight = weight + learning_rate * gradient

import torch.nn as nnimport torch.nn.functional as Fclass Net(nn.Module):#需要继承这个类    def __init__(self):        super(Net, self).__init__()        #建立了两个卷积层，self.conv1, self.conv2，注意，这些层都是不包含激活函数的        self.conv1 = nn.Conv2d(1, 6, 5) # 1 input image channel, 6 output channels, 5x5 square convolution kernel        self.conv2 = nn.Conv2d(6, 16, 5)        #三个全连接层        self.fc1   = nn.Linear(16*5*5, 120) # an affine operation: y = Wx + b        self.fc2   = nn.Linear(120, 84)        self.fc3   = nn.Linear(84, 10)    def forward(self, x): #注意，2D卷积层的输入data维数是 batchsize*channel*height*width        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # Max pooling over a (2, 2) window        x = F.max_pool2d(F.relu(self.conv2(x)), 2) # If the size is a square you can only specify a single number        x = x.view(-1, self.num_flat_features(x))        x = F.relu(self.fc1(x))        x = F.relu(self.fc2(x))        x = self.fc3(x)        return x        def num_flat_features(self, x):        size = x.size()[1:] # all dimensions except the batch dimension        num_features = 1        for s in size:            num_features *= s        return num_featuresnet = Net()net
  
  
  
  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

Net (  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))  (fc1): Linear (400 -> 120)  (fc2): Linear (120 -> 84)  (fc3): Linear (84 -> 10))
  
  
  
  1
2
3
4
5
6
7

len(list(net.parameters())) #为什么是10呢？ 因为不仅有weights，还有bias， 10=5*2。                            #list(net.parameters())返回的learnable variables 是按照创建的顺序来的                            #list(net.parameters())返回 a list of torch.FloatTensor objects
  
  
  
  1
2
3

input = Variable(torch.randn(1, 1, 32, 32))out = net(input) #这个地方就神奇了，明明没有定义__call__()函数啊，所以只能猜测是父类实现了，并且里面还调用了forward函数out              #查看源码之后，果真如此。那么，forward()是必须要声明的了，不然会报错out.backward(torch.randn(1, 10))
  
  
  
  1
2
3
4

使用loss criterion 和 optimizer训练网络

torch.nn包下有很多loss标准。同时torch.optimizer帮助完成更新权重的工作。这样就不需要手动更新参数了

learning_rate = 0.01for f in net.parameters():    f.data.sub_(f.grad.data * learning_rate)  # 有了optimizer就不用写这些了
  
  
  
  1
2
3

import torch.optim as optim# create your optimizeroptimizer = optim.SGD(net.parameters(), lr = 0.01)# in your training loop:optimizer.zero_grad() # 如果不置零，Variable 的梯度在每次 backward 的时候都会累加。output = net(input) # 这里就体现出来动态建图了，你还可以传入其他的参数来改变网络的结构loss = criterion(output, target)loss.backward()optimizer.step() # Does the update
  
  
  
  1
2
3
4
5
6
7
8
9
10
11
12

整体NN结构

import torch.nn as nnimport torch.nn.functional as Fclass Net(nn.Module):#需要继承这个类    def __init__(self):        super(Net, self).__init__()        #建立了两个卷积层，self.conv1, self.conv2，注意，这些层都是不包含激活函数的        self.conv1 = nn.Conv2d(1, 6, 5) # 1 input image channel, 6 output channels, 5x5 square convolution kernel        self.conv2 = nn.Conv2d(6, 16, 5)        #三个全连接层        self.fc1   = nn.Linear(16*5*5, 120) # an affine operation: y = Wx + b        self.fc2   = nn.Linear(120, 84)        self.fc3   = nn.Linear(84, 10)    def forward(self, x): #注意，2D卷积层的输入data维数是 batchsize*channel*height*width        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # Max pooling over a (2, 2) window        x = F.max_pool2d(F.relu(self.conv2(x)), 2) # If the size is a square you can only specify a single number        x = x.view(-1, self.num_flat_features(x))        x = F.relu(self.fc1(x))        x = F.relu(self.fc2(x))        x = self.fc3(x)        return x        def num_flat_features(self, x):        size = x.size()[1:] # all dimensions except the batch dimension        num_features = 1        for s in size:            num_features *= s        return num_featuresnet = Net()# create your optimizeroptimizer = optim.SGD(net.parameters(), lr = 0.01)# in your training loop:for i in range(num_iteations):    optimizer.zero_grad() # zero the gradient buffers，如果不归0的话，gradients会累加    output = net(input) # 这里就体现出来动态建图了，你还可以传入其他的参数来改变网络的结构    loss = criterion(output, target)    loss.backward() # 得到grad，i.e.给Variable.grad赋值    optimizer.step() # Does the update，i.e. Variable.data -= learning_rate*Variable.grad
  
  
  
  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

其它

关于求梯度，只有requires_grad=True的 leaf tensor 的梯度会被放在 .grad 属性中，其余 tensor 的梯度不会被保存在 .grad 属性中（可以用retain_grad使得requires_grad=True非leaf tensor的.grad属性存储其梯度）

# numpy to Tensorimport numpy as npa = np.ones(5)b = torch.from_numpy(a)np.add(a, 1, out=a)print(a) # 如果a 变的话， b也会跟着变，说明b只是保存了一个地址而已，并没有深拷贝print(b) # 
  
  
  
  1
2
3
4
5
6
7

a = np.ones(5)b = torch.from_numpy(a)# ndarray --> Tensora_ = b.numpy() # Tensor --> ndarraynp.add(a_, 1, out=a_) # 会影响 b 的值
  
  
  
  1
2
3
4

# 将Tensor放到Cuda上if torch.cuda.is_available():    x = x.to('cuda:0')    y = y.to('cuda:0')    x + y
  
  
  
  1
2
3
4
5

# tensor 与 numpyimport torchfrom torch.autograd import Variableimport numpy as npn1 = np.array([1., 2.]).astype(np.float32)t1 = torch.FloatTensor(n1)print(t1)# 使用 torch.FloatTensor(n1) 创建tensor，是深拷贝
  
  
  
  1
2
3
4
5
6
7
8