2021-04-15

最新推荐文章于 2022-12-03 11:35:16 发布

Daft shiner

最新推荐文章于 2022-12-03 11:35:16 发布

阅读量220

点赞数 1

分类专栏： utils 文章标签： pytorch 神经网络深度学习 python hooks

本文链接：https://blog.csdn.net/weixin_46782905/article/details/115710701

版权

utils 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

pytorch Hook练习

本文是根据参考文献的实现加自我理解
这里非常感谢参考文献1,2的作者，全文几乎是在参考文献1的基础上进行写作的。

Hook for Tensor

今天看到一篇论文代码有用到Hook的相关知识，学习了一下，将学习心得与大家分享一下
首先创建一个 $o = w * (x + y)$ 的函数，对其每个变量打印梯度信息。

import torch

x = torch.Tensor([0, 1, 2, 3]).requires_grad_()
y = torch.Tensor([4, 5, 6, 7]).requires_grad_()
w = torch.Tensor([1, 2, 3, 4]).requires_grad_()
z = x + y
z.retain_grad()
o = w.matmul(z)  # o = w * (x + y)
o.retain_grad()
o.backward()
# print('x.requires_grad:', x.requires_grad)
# print('y.requires_grad:', y.requires_grad)
# print('z.requires_grad:', z.requires_grad)
# print('w.requires_grad:', w.requires_grad)
# print('o.requires_grad:', o.requires_grad)
print('x.grad:', x.grad)
print('y.grad:', y.grad)
print('z.grad:', z.grad)
print('w.grad:', w.grad)
print('o.grad:', o.grad)

输出结果：如果没有z.retain_grad()和o.retain_grad()则这两个变量的梯度信息为None（这个我之前都不知道。。。）这是因为z和o是中间变量，尽管requires_grad都是True，但是反向传播后他们的梯度直接被删掉了。而retain_grad()则能使其梯度保存下来，至于这每个输出的梯度一个也比较容易理解，就不讲了。

x.grad: tensor([1., 2., 3., 4.])
y.grad: tensor([1., 2., 3., 4.])
z.grad: tensor([1., 2., 3., 4.])
w.grad: tensor([ 4.,  6.,  8., 10.])
o.grad: tensor(1.)

但是使用retain_grad()会增加内存占用，而hook则是一种替代保存中间变量梯度的方法。反向传播时，梯度传播到变量 z，再继续向前传播之前，将会传入 hook_fn。如果hook_fn的返回值是 None，那么梯度将不改变，继续向前传播，如果 hook_fn的返回值是 Tensor 类型，则该 Tensor 将取代 z 原有的梯度，向前传播。

import torch


def hook_fn(grad):
    '''
    grad: Gradient of variable
    return: Tensor or None
    if retrun is None，then gradients will not change
    if retrun is Tensor，then gradients will be updated to Tensor
    '''
    g = 2 * grad
    print(g)
    return g


x = torch.Tensor([0, 1, 2, 3]).requires_grad_()
y = torch.Tensor([4, 5, 6, 7]).requires_grad_()
w = torch.Tensor([1, 2, 3, 4]).requires_grad_()
z = x + y
z.retain_grad()
z.register_hook(hook_fn)
o = w.matmul(z)  # o = w * (x + y)
# o.retain_grad()
print('start')
o.backward()
print('end')
# print('x.requires_grad:', x.requires_grad)
# print('y.requires_grad:', y.requires_grad)
# print('z.requires_grad:', z.requires_grad)
# print('w.requires_grad:', w.requires_$grad)
# print('o.requires_grad:', o.requires_grad)
print('x.grad:', x.grad)
print('y.grad:', y.grad)
print('z.grad:', z.grad)
print('w.grad:', w.grad)
print('o.grad:', o.grad)

运行结果：

start
tensor([2., 4., 6., 8.])
end
x.grad: tensor([2., 4., 6., 8.])
y.grad: tensor([2., 4., 6., 8.])
z.grad: tensor([1., 2., 3., 4.])
w.grad: tensor([ 4.,  6.,  8., 10.])
o.grad: None

由于hook_fn(grad)返回了2倍的梯度，于是可以看出x和y的梯度都被更新了，w由于是先求出梯度再更新，因此w没受到影响。

Hook for modules

网络 module 不像 Tensor，拥有显式的变量名可以直接访问，而是被封装在神经网络中间。我们通常只能获得网络整体的输入和输出，对于夹在网络中间的模块，我们不但很难得知它输入/输出的梯度，甚至连它输入输出的数值都无法获得。除非设计网络时，在 forward 函数的返回值中包含中间 module 的输出，或者用很麻烦的办法，把网络按照 module 的名称拆分再组合，让中间层提取的 feature 暴露出来。为此，PyTorch 设计了两种 register_forward_hook 和register_backward_hook，分别用来获取正/反向传播时，中间层模块输入和输出的 feature/gradient，大大降低了获取模型内部信息流的难度。
register_forward_hook的作用是获取前向传播过程中，各个网络模块的输入和输出。使用方式为：module.register_forward_hook(hook_fn) 。
hook_fn的输入变量分别为：module，module的输入，module的输出，返回值为None。

register_forward_hook

class TestForHook(nn.Module):
    def __init__(self):
        super().__init__()

        self.linear_1 = nn.Linear(in_features=3, out_features=4)
        self.linear_2 = nn.Linear(in_features=4, out_features=1)
        self.relu = nn.ReLU()
        self.initialize()

    def forward(self, x):
        linear_1 = self.linear_1(x)
        linear_2 = self.linear_2(linear_1)
        relu = self.relu(linear_2)
        return relu

    def initialize(self):
        """ 定义特殊的初始化，用于验证是不是获取了权重"""
        self.linear_1.weight = torch.nn.Parameter(torch.FloatTensor([[1, 2, 3], [-4, -5, -6], [7, 8, 9], [-10, -11, -12]]))
        self.linear_1.bias = torch.nn.Parameter(torch.FloatTensor([1, 2, 3, 4]))
        self.linear_2.weight = torch.nn.Parameter(torch.FloatTensor([[1, 2, 3, 4]]))
        self.linear_2.bias = torch.nn.Parameter(torch.FloatTensor([1]))


features_in_hook = []
features_out_hook = []


def hook_forward(module, fea_in, fea_out):
    print(module)
    print('input:', fea_in)
    print('output:', fea_out)
    features_in_hook.append(fea_in)
    features_out_hook.append(fea_out)


x = torch.FloatTensor([[1, 1, 1]]).requires_grad_()
net = TestForHook()
net_chilren = net.children()  # 返回最外层名字
for child in net_chilren:
    child.register_forward_hook(hook=hook_forward)

out = net(x)
print(features_in_hook)
print(features_out_hook)

输出值

Linear(in_features=3, out_features=4, bias=True)
input: (tensor([[1., 1., 1.]], requires_grad=True),)
output: tensor([[  7., -13.,  27., -29.]], grad_fn=<AddmmBackward>)
Linear(in_features=4, out_features=1, bias=True)
input: (tensor([[  7., -13.,  27., -29.]], grad_fn=<AddmmBackward>),)
output: tensor([[-53.]], grad_fn=<AddmmBackward>)
ReLU()
input: (tensor([[-53.]], grad_fn=<AddmmBackward>),)
output: tensor([[0.]], grad_fn=<ReluBackward0>)
[(tensor([[1., 1., 1.]], requires_grad=True),), (tensor([[  7., -13.,  27., -29.]], grad_fn=<AddmmBackward>),), (tensor([[-53.]], grad_fn=<AddmmBackward>),)]
[tensor([[  7., -13.,  27., -29.]], grad_fn=<AddmmBackward>), tensor([[-53.]], grad_fn=<AddmmBackward>), tensor([[0.]], grad_fn=<ReluBackward0>)]

可以看到register_forward_hook将每一层的输入和输出打印出来了。

register_backward_hook

注意：这里的TestForHook和register_forward_hook的TestForHook不一样！！！

class TestForHook(nn.Module):
    def __init__(self):
        super().__init__()

        self.linear_1 = nn.Linear(in_features=3, out_features=4)
        self.linear_2 = nn.Linear(in_features=4, out_features=1)
        self.relu = nn.ReLU()
        self.initialize()

    def forward(self, x):
        linear_1 = self.linear_1(x)
        relu = self.relu(linear_1)
        linear_2 = self.linear_2(relu)
        return linear_2

    def initialize(self):
        """ 定义特殊的初始化，用于验证是不是获取了权重"""
        self.linear_1.weight = torch.nn.Parameter(torch.FloatTensor([[1, 2, 3], [-4, -5, -6], [7, 8, 9], [-10, -11, -12]]))
        self.linear_1.bias = torch.nn.Parameter(torch.FloatTensor([1, 2, 3, 4]))
        self.linear_2.weight = torch.nn.Parameter(torch.FloatTensor([[1, 2, 3, 4]]))
        self.linear_2.bias = torch.nn.Parameter(torch.FloatTensor([1]))


features_in_hook = []
features_out_hook = []


def hook_backward(module, input_grad, output_grad):
    '''
    这里的梯度都是相对输入端而言的
    '''
    print(module)
    print('input_grad:', input_grad)
    print('output_grad:', output_grad)
    features_in_hook.append(input_grad)
    features_out_hook.append(output_grad)


x = torch.FloatTensor([[1, 1, 1]]).requires_grad_()
net = TestForHook()
net_chilren = net.children()  # 返回最外层名字
for child in net_chilren:
    child.register_backward_hook(hook=hook_backward)

out = net(x)
out.backward()

输出

Linear(in_features=4, out_features=1, bias=True)
input_grad: (tensor([1.]), tensor([[1., 2., 3., 4.]]), tensor([[ 7.],
        [ 0.],
        [27.],
        [ 0.]]))
output_grad: (tensor([[1.]]),)
ReLU()
input_grad: (tensor([[1., 0., 3., 0.]]),)
output_grad: (tensor([[1., 2., 3., 4.]]),)
Linear(in_features=3, out_features=4, bias=True)
input_grad: (tensor([1., 0., 3., 0.]), tensor([[22., 26., 30.]]), tensor([[1., 0., 3., 0.],
        [1., 0., 3., 0.],
        [1., 0., 3., 0.]]))
output_grad: (tensor([[1., 0., 3., 0.]]),)

这里需要特别注意的是，此处的输入端和输出端，是前向传播时的输入端和输出端。例如线性模块： $o = W * x + b$ ，其输入端为 W，x 和 b，输出端为 o
对线性模块，input_grad 是一个三元组，排列顺序分别为：对 bias 的导数，对输入 x 的导数，对权重 W 的导数。
反向传播的输出最好读者自己手推一遍，注意relu在负值时梯度为0。
我的推导过程，可能不是特别规范，字丑莫怪：
在这里插入图片描述
与hook_forward不同之处：

1.在 hook_forward 中，input 是 x，而不包括 W 和 b。

2.可以返回 Tensor 或者 None, hook_ backward不能直接改变它的输入变量，但是可以返回新的 input_grad，反向传播到它上一个模块。（感觉可以防止梯度消失）

Daft shiner

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
2021-04-15

pytorch Hook练习本文是根据参考文献的实现加自我理解这里非常感谢参考文献1,2的作者，全文几乎是在参考文献1的基础上进行写作的。Hook for Tensor今天看到一篇论文代码有用到Hook的相关知识，学习了一下，将学习心得与大家分享一下首先创建一个o=w∗(x+y)o=w*(x+y)o=w∗(x+y)的函数，对其每个变量打印梯度信息。import torchx = torch.Tensor([0, 1, 2, 3]).requires_grad_()y = torch.Ten
复制链接

扫一扫

专栏目录