记录pytorch中遇到的一些问题,文章没有顺序关系
官方连接:
ReLU — PyTorch 1.10.0 documentation
inplace为True时,计算结果会对原来的结果进行覆盖。
还是看下pytorch中的具体操作:
>>> import torch
>>> import torch.nn as nn
>>> conv1 = nn.Conv2d(3, 3, kernel_size=3)
>>> rl1 = nn.ReLU(inplace=True)
>>> rl2 = nn.ReLU()
>>> input = torch.randn(1,3,5,5)
>>> o1 = conv1(input)
>>> id(o1)
139670453299872
>>> o1
tensor([[[[-0.1162, 0.5905, 1.0601],
[-0.1423, 0.7013, 0.1079],
[ 0.1096, -0.3253, -0.6799]],
[[ 0.3407, 0.5013, -0.2121],
[-0.6805, -0.8362, 0.3360],
[ 1.1606, -0.2564, 0.2965]],
[[ 0.4317, -0.2480, 0.2381],
[-0.0314, -0.0850, 0.1920],
[-0.2762, 0.0338, -0.2298]]]], grad_fn=<ThnnConv2DBackward>)
>>> h1 = rl1(o1)
>>> id(h1)
139670453299872 #和o1的id一样,说明h1和o1指向同一个地方
>>> o1 # o1的值发生了变化,inplace操作起了作用
tensor([[[[0.0000, 0.5905, 1.0601],
[0.0000, 0.7013, 0.1079],
[0.1096, 0.0000, 0.0000]],
[[0.3407, 0.5013, 0.0000],
[0.0000, 0.0000, 0.3360],
[1.1606, 0.0000, 0.2965]],
[[0.4317, 0.0000, 0.2381],
[0.0000, 0.0000, 0.1920],
[0.0000, 0.0338, 0.0000]]]], grad_fn=<ReluBackward1>)
>>> h1
tensor([[[[0.0000, 0.5905, 1.0601],
[0.0000, 0.7013, 0.1079],
[0.1096, 0.0000, 0.0000]],
[[0.3407, 0.5013, 0.0000],
[0.0000, 0.0000, 0.3360],
[1.1606, 0.0000, 0.2965]],
[[0.4317, 0.0000, 0.2381],
[0.0000, 0.0000, 0.1920],
[0.0000, 0.0338, 0.0000]]]], grad_fn=<ReluBackward1>)
从上面的操作可以看出,如果采用inplace操作,输入参数o1的结果被直接修改。
下面看下inplace为False时的结果:
>>> o1 = conv1(input)
>>> id(o1)
139670453299712
>>> o1
tensor([[[[-0.1162, 0.5905, 1.0601],
[-0.1423, 0.7013, 0.1079],
[ 0.1096, -0.3253, -0.6799]],
[[ 0.3407, 0.5013, -0.2121],
[-0.6805, -0.8362, 0.3360],
[ 1.1606, -0.2564, 0.2965]],
[[ 0.4317, -0.2480, 0.2381],
[-0.0314, -0.0850, 0.1920],
[-0.2762, 0.0338, -0.2298]]]], grad_fn=<ThnnConv2DBackward>)
>>> h1 = rl2(o1) # relu的inplace为False
>>> id(h1) #h1和o1的id不一样了
139670453330560
>>> o1 #查看o1的值,发现没有改变
tensor([[[[-0.1162, 0.5905, 1.0601],
[-0.1423, 0.7013, 0.1079],
[ 0.1096, -0.3253, -0.6799]],
[[ 0.3407, 0.5013, -0.2121],
[-0.6805, -0.8362, 0.3360],
[ 1.1606, -0.2564, 0.2965]],
[[ 0.4317, -0.2480, 0.2381],
[-0.0314, -0.0850, 0.1920],
[-0.2762, 0.0338, -0.2298]]]], grad_fn=<ThnnConv2DBackward>)
>>> h1 #h1是经过了relu操作后的结果
tensor([[[[0.0000, 0.5905, 1.0601],
[0.0000, 0.7013, 0.1079],
[0.1096, 0.0000, 0.0000]],
[[0.3407, 0.5013, 0.0000],
[0.0000, 0.0000, 0.3360],
[1.1606, 0.0000, 0.2965]],
[[0.4317, 0.0000, 0.2381],
[0.0000, 0.0000, 0.1920],
[0.0000, 0.0338, 0.0000]]]], grad_fn=<ReluBackward0>)
inplace为False时,不修改输入的值,而是生成一个新的对象,符合预期。
采用原地操作可以节省内存,但是在多分支(Multi-branch)的网络中,使用时需要注意,比如:
conv1 = nn.Conv2d(3, 3, kernel_size=3)
conv2 = nn.Conv2d(3, 3, kernel_size=3)
rl1 = nn.ReLU(inplace=True)
...
x = conv1(x)
h1 = rl1(x)
h2 = conv2(x) # 此时x的值可能已经变化了