[ Pytorch ] 代码使用经验总结

小小的行者

已于 2023-04-25 16:47:27 修改

阅读量1.3w

点赞数 14

分类专栏： Pytorch 文章标签： 1024程序员节

于 2018-05-23 10:17:09 首次发布

本文链接：https://blog.csdn.net/jdzwanghao/article/details/80415654

版权

Pytorch 专栏收录该内容

16 篇文章 4 订阅

订阅专栏

资源合集

教程
官方教程	丨官方文档丨官方教程丨
中文教程	丨网站01丨网站02丨
书籍	丨深度学习框架PyTorch：入门与实践丨
交流论坛
官方	丨官方论坛丨

他人总结的经验
知乎经验	【他山之石】pytorch常见的坑汇总
知乎专栏-Pytorch日积月累

3、backwards中的retain_graph参数的作用：

4、自己定义可以自动求导的函数——扩展autograd

二、编程中的使用经验汇总

一、数据处理

1 、torch的tensor 、variable与 numpy的array 相互转换。

2、把标签(一个标量数据)变成使用One-hot编码的方法：

3、torch.Tensor中的各种数据类型转换方法：

4、对numpy矩阵和 torch.Tensor的形状理解：

(1)、形状理解：

(2)、0维矩阵与 0维tensor (即：标量)：

(3)、如何数numpy或者torch.Tensor的维度

(4) axis与shape的关系

5、torch.Tensor的相乘法则：

(1)、普通相乘： c = A * B

(2) 爱因斯坦乘法 torch.einsum() ，实现指定形状的指定乘法。

6、列表(list)转换成torch.Tensor的方法。

7、常用的改变形状的方法合集。

(1) [tensor].unsqueeze(dim)

(2) 重复张量

(3) tensor分块chunk

(4) torch.nn.Unfold分块

8、tensor按索引取值。

9、tensor按索引赋值。

(1) [Tensor].scatter_(dim, index, src)

(2) index_put_(indices, value)

(3) 利用python的zip()函数（最简单）

10. tensor类似数组append操作的方法

11. 删除tensor的指定batch。

12. 将批量数组转换成对角阵的方法。

13. 将 1 维向量转换成上/下三角矩阵

# 一些资源汇总

二、模型使用

1、model.named_children读取模型的名称与模型。

2、自定义卷积核进行卷积操作

三、模型运行

1. 缓存样本的feature。使用torch.no_grad() 或者 out.detach() 不计算梯度，从而节省tensor在显存中占用空间。

三、神奇的发现

1. torch.nn.BatchNorm1d在forward时，输入可以是 [ B, dim_main ] 或者 [ B, dim_main, dim ]，但计算方式差别很大！

四、有意思的函数

1. torch.autograd.Function定义好后，使用.apply函数给定义的函数取别名。

一、基础知识理解汇总

一、自动求导机制(Autograd)

扩展：丨博客01丨

1、基本求导代码理解：

# ———— output: scalar(输出是标量时候)

x = torch.ones(1, requires_grad=True) # x = 1
y = 2 * x ** 2  # y=2*x^2 ,其中 x = 1, y是1维

gradients = torch.tensor([0.1], dtype=torch.float) # [0.1] 表示各个维度上导函数前的权重
y.backward(gradients) # y'= ∂(2*x^2)/∂x = 4x

print(x.grad)  # x在x=1时候的 导数值

[输出结果]>> tensor([ 0.4000])

# ———— output: tensor(输出是多个值)

x = torch.ones(3, requires_grad=True) # x = [1,1,1]
y = 2 * x ** 2  # y=2*x^2 ,其中 x = [1,1,1], y是3维

gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float) # [0.1, 1.0, 0.0001] 表示各个维度上导函数前的权重
y.backward(gradients) # y'= ∂(2*x^2)/∂x = 4x

print(x.grad)  # x在x=[1,1,1]时候的 导数值

[输出结果] >> tensor([ 0.4000,  4.0000,  0.0004])

2、自动求导过程中的梯度

自动求导过程中，只会保留叶子节点的梯度：参考：丨链接1丨链接2丨

import torch

x = torch.Tensor([0, 1, 2, 3]).requires_grad_()
y = torch.Tensor([4, 5, 6, 7]).requires_grad_()
w = torch.Tensor([1, 2, 3, 4]).requires_grad_()
z = x+y

# ===================
def hook_fn(grad):
    print(grad)

z.register_hook(hook_fn)
# ===================

o = w.matmul(z)

print('=====Start backprop=====')
o.backward()
print('=====End backprop=====')

print('x.grad:', x.grad)
print('y.grad:', y.grad)
print('w.grad:', w.grad)
print('z.grad:', z.grad)

【运行结果】
=====Start backprop=====
tensor([1., 2., 3., 4.])
=====End backprop=====
x.grad: tensor([1., 2., 3., 4.])
y.grad: tensor([1., 2., 3., 4.])
w.grad: tensor([ 4.,  6.,  8., 10.])
z.grad: None

3、backwards中的retain_graph参数的作用：

转载自：丨博客丨

其实retain_graph这个参数在平常中我们是用不到的，但是在特殊的情况下我们会用到它：

假设一个我们有一个输入x，y = x **2, z = y*4，然后我们有两个输出，一个output_1 = z.mean()，另一个output_2 = z.sum()。然后我们对两个output执行backward。

In[3]: import torch
In[5]: x = torch.randn((1,4),dtype=torch.float32,requires_grad=True)
In[6]: y = x ** 2
In[7]: z = y * 4
In[8]: output1 = z.mean()
In[9]: output2 = z.sum()
In[10]: output1.backward()    # 这个代码执行正常，但是执行完中间变量都free了，所以下一个出现了问题
In[11]: output2.backward()    # 这时会引发错误
Traceback (most recent call last):
  File "/home/prototype/anaconda3/envs/pytorch-env/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-11-32d5139229de>", line 1, in <module>
    output2.backward()
  File "/home/prototype/anaconda3/envs/pytorch-env/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/prototype/anaconda3/envs/pytorch-env/lib/python3.6/site-packages/torch/autograd/__init__.py", line 89, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

如果我们这样写：

In[3]: import torch
  ...: from torch.autograd import Variable
  ...: x = torch.randn((1,4),dtype=torch.float32,requires_grad=True)
  ...: y = x ** 2
  ...: z = y * 4
  ...: output1 = z.mean()
  ...: output2 = z.sum()
  ...: output1.backward(retain_graph=True)   # 这里参数表明保留backward后的中间参数。
  ...: output2.backward()

有两个输出的时候就需要用到这个参数，这就和之前提到的风格迁移中Content Loss层为什么使用这个参数有了联系，因为在风格迁移中不只有Content Loss层还有Style Loss层，两个层都公用一个神经网络的参数但是有两个loss的输出，因此需要retain_graph参数为True去保留中间参数从而两个loss的backward()不会相互影响。

也就相当于，假如你有两个Loss：

# 假如你有两个Loss，先执行第一个的backward，再执行第二个backward
loss1.backward(retain_graph=True)
loss2.backward() # 执行完这个后，所有中间变量都会被释放，以便下一次的循环
optimizer.step() # 更新参数

这样就比较容易理解了。

4、自己定义可以自动求导的函数——扩展autograd

以下转载自：链接

其他：链接

目前绝大多数函数都可以使用autograd实现反向求导，但如果需要自己写一个复杂的函数，不支持自动反向求导怎么办? 写一个Function，实现它的前向传播和反向传播代码，Function对应于计算图中的矩形，它接收参数，计算并返回结果。下面给出一个例子。

from torch.autograd import Function
class MultiplyAdd(Function):
    
    @staticmethod
    def forward(ctx, w, x, b):
        print('type in forward', type(x))
        ctx.save_for_backward(w, x)#存储用来反向传播的参数
        output = w*x +b
        return output
    
    @staticmethod
    def backward(ctx, grad_output):
        w, x = ctx.saved_variables #deprecated,现在使用saved_tensors
        print('type in backward',type(x))
        grad_w = grad_output * x
        grad_x = grad_output * w
        grad_b = grad_output * 1
        return grad_w, grad_x, grad_b

分析如下：

自定义的Function需要继承autograd.Function，没有构造函数__init__，forward和backward函数都是静态方法
forward函数的输入和输出都是Tensor，backward函数的输入和输出都是Variable
backward函数的输出和forward函数的输入一一对应，backward函数的输入和forward函数的输出一一对应
backward函数的grad_output参数即t.autograd.backward中的grad_variables
如果某一个输入不需要求导，直接返回None，如forward中的输入参数x_requires_grad显然无法对它求导，直接返回None即可
反向传播可能需要利用前向传播的某些中间结果，需要进行保存，否则前向传播结束后这些对象即被释放

Function的使用利用Function.apply(variable)

from torch.autograd import Function
class MultiplyAdd(Function):
    
    @staticmethod
    def forward(ctx, w, x, b):
        print('type in forward', type(x))
        ctx.save_for_backward(w, x)#存储用来反向传播的参数
        output = w*x +b
        return output
    
    @staticmethod
    def backward(ctx, grad_output):
        w, x = ctx.saved_variables #deprecated,现在使用saved_tensors
        print('type in backward',type(x))
        grad_w = grad_output * x
        grad_x = grad_output * w
        grad_b = grad_output * 1
        return grad_w, grad_x, grad_b

调用方法
类名.apply(参数)
输出变量.backward()

from torch.autograd import Variable as V
x = V(t.ones(1))
w = V(t.rand(1),requires_grad=True)
b = V(t.rand(1),requires_grad=True)
print('开始前向传播')
z = MultiplyAdd.apply(w, x, b)
print('开始反向传播')
z.backward()

# x不需要求导，中间过程还是会计算它的导数，但随后被清空
x.grad, w.grad, b.grad

【结果】
开始前向传播
type in forward <class 'torch.Tensor'>
开始反向传播
type in backward <class 'torch.Tensor'>
(None, tensor([1.]), tensor([1.]))

二、编程中的使用经验汇总

一、数据处理

1 、torch的tensor 、variable与 numpy的array 相互转换。

- tensor⇒array

x = torch.tensor([[1, 2, 3], [4, 5, 6]])
x = x.numpy()
print(x)

>>[[1 2 3]
 [4 5 6]]

- array⇒tensor

x = torch.tensor([[1, 2, 3], [4, 5, 6]])
x = x.numpy()

x = torch.from_numpy(x)

print(x)

>>tensor([[ 1,  2,  3],
        [ 4,  5,  6]])

- Variable=>np.array

# 将Variable张量转化为numpy
x = torch.autograd.Variable(torch.FloatTensor(8,100,1,1))
x = x.data.numpy()

- np.array=>Variable

# 将numpy转化为Variable张量
x = np.array([8, 3, 64, 64])
x = torch.from_numpy(x)
x = torch.autograd.Variable(x)

2、把标签(一个标量数据)变成使用One-hot编码的方法：

方法1：

参考：Convert int into one-hot format - #4 by albanD - PyTorch Forums

import torch.utils.data
import torch
import torch.nn as nn
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms

import numpy as np
import pickle

a,b = cifar_trans_with_labels[0]
# print('a=',a,'b=',b)

import torch

batch_size = 1
nb_digits = 10

print('b是个标量:',b) # b 是cifar-10中的标签，是个标量。
labels_onehot = torch.FloatTensor(batch_size, nb_digits)

labels=np.array([b]) # 把标量b变成一维numpy矩阵。
print('labels_numpy:',labels)
print('labels_numpy_size:',labels.shape) # labels_numpy_size: (1,), 一项就是一维。

labels = torch.from_numpy(labels) # 变成torch.Tensor
labels = labels.long() # 下面的 labels_onehot.scatter_(1, labels, 1) 需要labels中的数据是long类型。
labels = labels.view(1,-1)

print('labels_torchTensor_shape:',labels.shape)
print('labels_torchTensor_value:',labels)

labels_onehot.zero_()
labels_onehot.scatter_(1, labels, 1) # 变成one-hot编码。

print('labels_One-shot:',labels_onehot)
print('labels_One-shot_shape:',labels_onehot.shape)
print('labels_One-shot_tensortype:',labels_onehot.type())

【结果】

b是个标量: 6
labels_numpy: [6]
labels_numpy_size: (1,)
labels_torchTensor_shape: torch.Size([1, 1])
labels_torchTensor_value: tensor([[ 6]])
labels_One-shot: tensor([[ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.]])
labels_One-shot_shape: torch.Size([1, 10])
labels_One-shot_tensortype: torch.FloatTensor

方法2：tensor.scatter_()函数

import torch

log_probs = torch.FloatTensor(torch.rand([4,10]))  # shape: [batch_size, num_class]
targets = torch.FloatTensor([1,2,3,4])  # shape: [batch_size] 
targets = targets.long()

unsquee_targets = targets.unsqueeze(1).data.cpu()  # index which is used to fill '1' into right location in one-hot tensor of target
print(unsquee_targets)
targets = torch.zeros(log_probs.size()).scatter_(1, unsquee_targets, 1)
print(targets)

【结果】
tensor([[1],
        [2],
        [3],
        [4]])
tensor([[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.]])

3、torch.Tensor中的各种数据类型转换方法：

pytorch张量torch.Tensor类型的构建与相互转换以及torch.type()和torch.type_as()的用法 - pytorch中文网

4、对numpy矩阵和 torch.Tensor的形状理解：

(1)、形状理解：

import numpy as np
import torch

a = [
      [
          3,4,6
      ]
     ]
#注：a 是个列表。
a_nparray = np.array(a)
print('a变成np矩阵后a_nparray:',a_nparray)
print('a变成np矩阵后a_nparray的形状:',a_nparray.shape)
print('形状为 (1, 3) 的意义就是：①有几个项就表示几个维度，这里有 1,3 两个项，因此a_nparray是二维矩阵；'
      '②每一项的值表示这一维有几个元素。')

print('')

a_torchtensor = torch.from_numpy(a_nparray)
print('a转化为torch.Tensor之后：',a_torchtensor)
print('a转化为torch.Tensor之后的形状：',a_torchtensor.shape)
print('torch.Size([1, 3]) 就表示:①有几个项就表示几个维度，这里有 1,3 两个项，因此a_nparray是二维矩阵；'
      '②每一项的值表示这一维有几个元素。')

(2)、0维矩阵与 0维tensor (即：标量)：

import torch
import numpy as np

a = 1
print('标量a的值:', a)
a_nparray = np.array(a)
print('a变成np矩阵后的形状:',a_nparray.shape)
print('() 就表示是标量，即0维')

a_torchtensor = torch.from_numpy(a_nparray)
print('a转化为torch.Tensor之后：',a_torchtensor)
print('a转化为torch.Tensor之后的形状：',a_torchtensor.shape)
print('torch.Size([]) 就表示是标量，即0维tensor')

(3)、如何数numpy或者torch.Tensor的维度

import torch

x = torch.tensor([[1], [2], [3]])
print(x.shape)

【结果】
torch.Size([3, 1])

(4) axis与shape的关系

5、torch.Tensor的相乘法则：

**(1)、普通相乘： c = A * B**

规则：A和B对应维度的对应位置相乘。

import torch

input = torch.Tensor([[1,2],
                      [1,2]]) # size:[2,2]
print('input_shape: ', input.shape)

yaw = torch.Tensor( [ [10], [20] ] ) # size:[2, 1]
print('yaw_shape: ', yaw.shape)

yaw = yaw.view(yaw.size(0),1) # yaw 的shape为：[256, 1]
print('after yaw_view:', yaw)
print('after yaw_view shape:', yaw.shape)
yaw = yaw.expand_as(input)
print('after yaw_view_expand:', yaw)
print('after yaw_view_expand shape:', yaw.shape)

# output= yaw * input
output= input * yaw # tensor 对应维度的 对应位置元素 相乘。

print('output:', output)
print('output.shape', output.shape)

(2) 爱因斯坦乘法 torch.einsum() ，实现指定形状的指定乘法。

官方文档地址

import torch
q=torch.rand(4,3,8,128)
v=torch.rand(4,3,8,128)
q_dot_v = torch.einsum('b c i d, b c j d -> b c i j', q, k)

6、列表(list)转换成torch.Tensor的方法。

import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2"

aa = torch.rand(3,256,128)

bb = []
bb.append(aa)
bb.append(aa)

print(torch.stack(bb).shape)

【结果】
>>>  torch.size([2,3,256,128])

7、常用的改变形状的方法合集。

(1) [tensor].unsqueeze(dim)

作用：在[tensor]的指定位置dim添加一个维度。

import torch
import torchvision
from torch.autograd import Variable

targets = torch.rand(128)
print(targets.shape)
targets_uns = targets.unsqueeze(1).data.cpu()  # 在targets的第2个维度上增加一个维度
print(targets_uns.shape)


【结果】
>>> torch.Size([128])
>>> torch.Size([128, 1])

(2) 重复张量

torch.Tensor.repeat(*sizes)

沿着指定的维度重复张量。不同于expand()方法，本函数复制的是张量中的数据。

参数：

size (torch.size or int…) - 沿着每一维重复的次数

x = torch.Tensor([1, 2, 3])
x.repeat(4, 2)
1 2 3 1 2 3
1 2 3 1 2 3
1 2 3 1 2 3
1 2 3 1 2 3
[torch.FloatTensor of size 4x6]

(3) tensor分块chunk

a = torch.rand([1,1,3,3])
print(a)
print('='*70)
############################################
###  .chunk([分成的份数]，[第几维上进行])  ###
############################################
chunk_list = a.chunk(3,2)
for data in enumerate(chunk_list):
    print(data,'\n')

结果

tensor([[[[0.9416, 0.9135, 0.9924],
          [0.6235, 0.8031, 0.4345],
          [0.0790, 0.6187, 0.7088]]]])
======================================================================
(0, tensor([[[[0.9416, 0.9135, 0.9924]]]])) 
(1, tensor([[[[0.6235, 0.8031, 0.4345]]]])) 
(2, tensor([[[[0.0790, 0.6187, 0.7088]]]]))

(4) torch.nn.Unfold分块

Unfold的作用是提取出滑动的局部区域块，也即是实现所谓局部连接的滑动窗口操作。可以通过将 kernel size 和 stride的大小设置一样大来实现对特征图谱的分块。

import torch
import torch.nn as nn

### 1. Parameter ###
K=2 # kernel size
B=2 # batch size
C=2 # channel num
img_size=3 

### 2. Block Func ###
block_func = nn.Unfold( K, dilation=1, padding=0,stride=1)

### 3. Forward ###
inputs = torch.rand(B, C, img_size, img_size); print('\n')
out = block_func(inputs); print(out.shape) ## [B, C*k_h*k_w, L]; L是形状为k_h*k_w的窗口的个数。
out = out.view(B, C, -1, K*K)
out = out.permute(0,2,1,3) ## [B, L, C, K*K]
out = out.view(*out.shape[:3],2,2)
print(out[1])

其中， $L = \prod_d \left\lfloor\frac{\text{spatial\_size}[d] + 2 \times \text{padding}[d] - \text{dilation}[d] \times (\text{kernel\_size}[d] - 1) - 1}{\text{stride}[d]} + 1\right\rfloor$ 。举个例子，输入形状为 $[B,C,H_{in},W_{in}]$ ，那么 $\left\{\begin{matrix} H_{in} = \text{spatial\_size}[0] \\ W_{in} = \text{spatial\_size}[1] \end{matrix}\right.$ ， $L = H_{out} \cdot W_{out}$ 。

应用

1. 卷积操作。

inp = torch.randn(1, 3, 10, 12)  ## [B,C_in,H,W]
w = torch.randn(2, 3, 4, 5) ## [C_out,C_in,k_h,k_w]
inp_unf = torch.nn.functional.unfold(inp, (4, 5)); print(inp_unf.shape) ## [B, C_in*k_h*k_w, L]; L是形状为k_h*k_w的窗口的个数。

out_unf = inp_unf.transpose(1, 2).matmul(w.view(w.size(0), -1).t()).transpose(1, 2); print(out_unf.shape)  ## [B, L, C_in*k_h*k_w] matmul [C_in*k_h*k_w, C_out]

out = torch.nn.functional.fold(out_unf, (7, 8), (1, 1))
# or equivalently (and avoiding a copy),
# out = out_unf.view(1, 2, 7, 8)

(torch.nn.functional.conv2d(inp, w) - out).abs().max()

>>> tensor(1.9073e-06)

8、tensor按索引取值。

方法1：

参考博客：| 地址 |

import torch
 
x = torch.linspace(1, 12, steps=12).view(3,4)
print(x)

indices = torch.LongTensor([0, 2])
y = torch.index_select(x, 0, indices)
print(y)
 
z = torch.index_select(x, 1, indices)
print(z)
 
z = torch.index_select(y, 1, indices)
print(z)
————————————————
版权声明：本文为CSDN博主「ShellCollector」的原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/jacke121/article/details/83044660

方法2（简单方法）：

import torch

a = torch.FloatTensor([11,22,33,44])
index_mat = torch.LongTensor(list(range(4))).repeat(4,1)
print(a[index_mat])

【结果】
tensor（
[11,22,33,44]，
[11,22,33,44]，
[11,22,33,44]，
[11,22,33,44]）

方法3：

torch.masked_select(input, mask, *, out=None) → Tensor

>>> x = torch.randn(3, 4)
>>> x
tensor([[ 0.3552, -2.3825, -0.8297,  0.3477],
        [-1.2035,  1.2252,  0.5002,  0.6248],
        [ 0.1307, -2.0608,  0.1244,  2.0139]])
>>> mask = x.ge(0.5)
>>> mask
tensor([[False, False, False, False],
        [False, True, True, True],
        [False, False, False, True]])
>>> torch.masked_select(x, mask)
tensor([ 1.2252,  0.5002,  0.6248,  2.0139])

9、tensor按索引赋值。

(1) [Tensor].scatter_(dim, index, src)

作用：在 [Tensor] 中的指定位置填入相应的值。将src中数据根据index中的索引按照dim的方向填进[Tensor]中。参考博客

x = torch.rand(2, 3)
print(x)
print('\n\n')
index = torch.LongTensor([[0, 1, 2],
                          [2, 0, 0]])

zeros = torch.zeros(3,4)
zeros.scatter_(0, index, x)
print(zeros)

【结果】
tensor([[0.2763, 0.7619, 0.8786],
        [0.1385, 0.4976, 0.7380]])

tensor([[0.2763, 0.4976, 0.7380, 0.0000],
        [0.0000, 0.7619, 0.0000, 0.0000],
        [0.1385, 0.0000, 0.8786, 0.0000]])

图解：

图的vsd文件。

一个应用：one-hot编码标签。

import torch
import torchvision
from torch.autograd import Variable

pred = torch.rand(128,702)
pred = Variable(pred)
logsoftmax= torch.nn.LogSoftmax(dim=1)
log_probs = logsoftmax(pred)

targets = torch.rand(128)
targets =targets.long()
targets = Variable(targets)

print(pred.shape)
print(log_probs.shape)
print(targets.shape)

zeros = torch.zeros(log_probs.size())
targets_uns = targets.unsqueeze(1).data.cpu()
print(targets_uns.shape)

targets = zeros.scatter_(1, targets_uns, 1)
print(targets.shape)


【结果】
>>> torch.Size([128, 702])
>>> torch.Size([128, 702])
>>> torch.Size([128])
>>> torch.Size([128, 1])
>>> torch.Size([128, 702])

(2) index_put_(indices, value)

a = torch.zeros([2, 3, 3])
index = [torch.LongTensor([0, 1]), torch.LongTensor([0, 0]), torch.LongTensor([0, 1])]
a = torch.index_put(a, index, torch.Tensor([1, 5]))
print(a)

结果：

tensor([[[1., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 5., 0.],
         [0., 0., 0.],
         [0., 0., 0.]]])

解释：

(3) 利用python的zip()函数（最简单）

参考文章：为Numpy数组中的不同索引位置赋值 - 问答 - Python中文网

import torch

ori_tensor = torch.zeros(4, 3)
print("Origin Tensor:\n",ori_tensor)

coordinate = [(0,0),(1,1),(2,0),(3,1)]
row,col = zip(*coordinate)
ori_tensor[row,col] = 1
print("\nFilled wi Value 1:\n",ori_tensor)

结果：

Origin Tensor:
 tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
Filled wi Value 1:
 tensor([[1., 0., 0.],
        [0., 1., 0.],
        [1., 0., 0.],
        [0., 1., 0.]])

10. tensor类似数组append操作的方法

import torch

tensor_list = torch.zeros([0, 3])  # 设置一个 batch_size 为0维的 0元素tensor
value = torch.FloatTensor([[1,2,3],[4,5,6]])
for iiii in range(10):
    tensor_list = torch.cat([tensor_list, value], dim=0)
print(tensor_list)
print(tensor_list.shape)

11. 删除tensor的指定batch。

参考博客：地址

import torch

# 比如说我们有一个 [batch_size=5, feat_dim=3] 的feature，现在我想删除第2和4个feature
feature = torch.FloatTensor([[1, 2, 3],
                             [4, 5, 6],
                             [7, 8, 9],
                             [10, 11, 12],
                             [13, 14, 15]])
index_list = list(torch.arange(feature.shape[0]))
remove_index_list = [1, 3]
for ii_remove in remove_index_list:
    index_list[ii_remove] = -6.6  # 因为正常index不可能取负的float型数据，所以用来做一个占位符。
while -6.6 in index_list:
    index_list.remove(-6.6)
removed_index_list = torch.LongTensor(index_list)
feature_ = feature[removed_index_list]
print(feature_)

12. 将批量数组转换成对角阵的方法。

方法参考：Batch of diagonal matrix - PyTorch Forums

mat = Variable(torch.randn(3, 4))
res = Variable(torch.zeros(3, 4, 4))
res.as_strided(mat.size(), [res.stride(0), res.size(2) + 1]).copy_(mat)

但实际上，显存占用差不多。运算时间上，采用for循环方式还略微快点。。。

13. 将 1 维向量转换成上/下三角矩阵

import torch

####################
###   Function   ###
####################
def to_triu_matrix(vector, size=3):
    assert sum(list(range(size+1)))==vector.shape[-1], 'The length of input vector is {}, which is not equal to the sum of [range(size+1)={}]'.format(vector.shape[-1], sum(list(range(size+1))))

    zeros_mat = torch.zeros([size, size]).to(vector.device)
    # print(zeros_mat.type())

    indices   = torch.triu_indices(size,size).t()
    tmp = []
    for idx in indices:
        tmp.append(idx)
    row, col = zip(*tmp)

    zeros_mat[row, col] = vector

    return zeros_mat

def to_tril_matrix(vector, size=3):
    assert sum(list(range(size + 1))) == vector.shape[-1], 'The length of input vector is {}, which is not equal to the sum of [range(size+1)={}]'.format(vector.shape[-1], sum(list(range(size + 1))))

    zeros_mat = torch.zeros([size, size]).to(vector.device)
    # print(zeros_mat.type())

    indices   = torch.tril_indices(size,size).t()
    tmp = []
    for idx in indices:
        tmp.append(idx)
    row, col = zip(*tmp)

    zeros_mat[row, col] = vector

    return zeros_mat

################
###   Main   ###
################
input_vec = torch.FloatTensor([1,2,3,4,5,6])

print('Convert to upper triangular matrix:\n',to_triu_matrix(input_vec, size=3))
print('\n')
print('Convert to lower triangular matrix:\n',to_tril_matrix(input_vec, size=3))

Convert to upper triangular matrix:
 tensor([[1., 2., 3.],
        [0., 4., 5.],
        [0., 0., 6.]])

Convert to lower triangular matrix:
 tensor([[1., 0., 0.],
        [2., 3., 0.],
        [4., 5., 6.]])

# 一些资源汇总

1、pytorch张量维度操作（拼接、维度扩展、压缩、转置、重复……）。

2、 pytorch中Tensor 常用的操作

二、模型使用

1、model.named_children读取模型的名称与模型。

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

for name, module in net.named_children():
    print('name:\t', name)
    print('module:\t', module)

2、自定义卷积核进行卷积操作

转载自：pytorch 自定义卷积核进行卷积操作_lyl771857509的博客-CSDN博客_自定义卷积核

使用 torch.nn.functional.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)

import torch

class FreeDefineConV(nn.Module):
    def __init__(self, out_channel, in_channels):
        super(FreeDefineConV, self).__init__()
        kernel = [[-1.0, -2.0, -1.0],
                  [ 0.0,  0.0,  0.0],
                  [ 1.0,  2.0,  1.0]]
        kernel = torch.FloatTensor(kernel).expand(out_channel, in_channels,3,3)
        self.weight = nn.Parameter(data=kernel, requires_grad=False)
    
    def forward(self, x):
        out = F.conv2d(x, self.weight, 1, 1)
        return out

三、模型运行

1. 缓存样本的feature。使用torch.no_grad() 或者 out.detach() 不计算梯度，从而节省tensor在显存中占用空间。

一般的，在程序运行中我们想缓存每个样本的feature，我们很容易想到通过下面的方式。

import torch
from torch.autograd import Variable

bank = Variable(torch.zeros([100, 5])).cuda() 
model = model.cuda()
model.eval()

for inputs, targets in dataloader:

    inputs = Variable(inputs).cuda() # inputs shape [4, 5]
    targets = Variable(targets).cuda() # targets shape [4]
    
    feat = model(input) # feat shaoe [4, 5]

    bank[targets] = feat

但是上面的方式很容易导致 cuda out of memory。这是因为保存到 bank 中的feat不仅有feat本身，还有模型的梯度。解决的办法就是：

###### 方案1 ######

import torch
from torch.autograd import Variable

bank = Variable(torch.zeros([100, 5])).cuda() 
model = model.cuda()
model.eval()

for inputs, targets in dataloader:

    inputs = Variable(inputs).cuda() # inputs shape [4, 5]
    targets = Variable(targets).cuda() # targets shape [4]
    
    feat = model(input) # feat shaoe [4, 5]

    bank[targets] = feat.detach() ### 给feat加上.detach()就不会保存梯度到bank中。


###### 方案2 ######
...
for .. in ..:
    ...
    with torch.no_grad(): ### 使用 torch.no_grad()
        bank[targets] = feat.detach()

三、神奇的发现

1. torch.nn.BatchNorm1d在forward时，输入可以是 [ B, dim_main ] 或者 [ B, dim_main, dim ]，但计算方式差别很大！

import torch
import torch.nn as nn

class modelfunc(nn.Module):

    def __init__(self, class_num):
        super(modelfunc, self).__init__()
        self.bn = nn.BatchNorm1d(1)
        self.bn2 = nn.BatchNorm1d(2)

    def forward(self,x):
        x1 = self.bn(x)
        print('\n--- BN1 ---')
        print(x1)
        print('Input Shape:\t', x1.shape)

        x2 = self.bn2(x.squeeze())
        print('\n--- BN2 ---')
        print(x2)
        print('Input Shape:\t', x2.shape)

        return x

# 模型实例化
model_object = modelfunc(3)

input = torch.FloatTensor([[2,1],[1,4]]).view(2, 1, 2)
print(input)
output = model_object(input)

【实验结果】

tensor([[[2., 1.]],
        [[1., 4.]]])

--- BN1 ---
tensor([[[ 0.0000, -0.8165]],
        [[-0.8165,  1.6330]]], grad_fn=<NativeBatchNormBackward>)
Input Shape:	 torch.Size([2, 1, 2])

--- BN2 ---
tensor([[ 1.0000, -1.0000],
        [-1.0000,  1.0000]], grad_fn=<NativeBatchNormBackward>)
Input Shape:	 torch.Size([2, 2])

我们先看BN1的结果是怎么来的：

算出的结果和我们程序运行的结果一致。

我们再看BN2的结果是怎么来的：

算出的结果也和我们程序运行的结果一致。

找到上面两个操作不同原因了！这需要看BN论文中的一句话：

大致意思就是，对于卷积神经网络，我们在使用BN的时候，要把空间位置维度也当成batch。具体来说，对于输入特征图谱 [N, C, H, W]，我们要将 H 和 W 当成 batch，这样， BN实际计算的就是 [NHW, C]。

这样，同理，由于BN1要处理的输入形状是 [N, C, L]，按照 BN 论文的意思，L应该被当成batch，这样对于输入 [N, C, L] 形状，实际计算的是 [NL, C] 的 BN。就是这么简单。

四、有意思的函数

1. torch.autograd.Function定义好后，使用.apply函数给定义的函数取别名。

# ---- GradientRescale ---- #
class GradientRescaleFunction(torch.autograd.Function):

    @staticmethod
    def forward(ctx, input, weight):
        ctx.save_for_backward(input)
        output = input
        return output

    @staticmethod
    def backward(ctx, grad_output):
        input = ctx.saved_tensors
        grad_input = 0.2

        return grad_input   

##### 使用apply方法对自己定义的torch.autograd.Function方法取个别名
gradient_func = GradientFunction.apply

小小的行者

关注

14
点赞
踩
57

收藏

觉得还不错? 一键收藏
1
评论
[ Pytorch ] 代码使用经验总结

—— 教程网站合集 ——丨官方文档丨官方教程丨中文教程：丨网站01丨网站02丨书籍资料：丨深度学习框架PyTorch：入门与实践丨知乎经验——————————目录一、基础知识理解汇总一、自动求导机制(Autograd)1、基本求导代码理解：2、自动求导过程中的梯度3、backwards中的retain_graph参数的作用：4、自己定义可以自...
复制链接

扫一扫