pytorch定义新的自动求导函数

最新推荐文章于 2023-09-12 07:27:28 发布

l8947943

最新推荐文章于 2023-09-12 07:27:28 发布

阅读量1.3k

点赞数 1

分类专栏： Pytorch问题整理

本文链接：https://blog.csdn.net/l8947943/article/details/105633826

版权

Pytorch问题整理专栏收录该内容

9 篇文章 11 订阅

订阅专栏

在pytorch中想自定义求导函数，通过实现torch.autograd.Function并重写forward和backward函数，来定义自己的自动求导运算。参考官网上的demo：传送门
直接上代码，定义一个ReLu来实现自动求导

import torch


class MyRelu(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        # 我们使用ctx上下文对象来缓存，以便在反向传播中使用，ctx存储时候只能存tensor
        # 在正向传播中，我们接收一个上下文对象ctx和一个包含输入的张量input；
        # 我们必须返回一个包含输出的张量，
        # input.clamp(min = 0)表示讲输入中所有值范围规定到0到正无穷，如input=[-1,-2,3]则被转换成input=[0,0,3]
        ctx.save_for_backward(input)
        
        # 返回几个值，backward接受参数则包含ctx和这几个值
        return input.clamp(min = 0)

    @staticmethod
    def backward(ctx, grad_output):
        # 把ctx中存储的input张量读取出来
        input, = ctx.saved_tensors
        
        # grad_output存放反向传播过程中的梯度
        grad_input = grad_output.clone()
        
        # 这儿就是ReLu的规则，表示原始数据小于0，则relu为0，因此对应索引的梯度都置为0
        grad_input[input < 0] = 0
        return grad_input

进行输入数据并测试

dtype = torch.float
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# 使用torch的generator定义随机数，注意产生的是cpu随机数还是gpu随机数
generator=torch.Generator(device).manual_seed(42)

# N是Batch, H is hidden dimension，
# D_in is input dimension;D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

x = torch.randn(N, D_in, device=device, dtype=dtype,generator=generator)
y = torch.randn(N, D_out, device=device, dtype=dtype, generator=generator)

w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True, generator=generator)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True, generator=generator)

learning_rate = 1e-6
for t in range(500):
    relu = MyRelu.apply
    # 使用函数传入参数运算 
    y_pred = relu(x.mm(w1)).mm(w2)
	# 计算损失
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())
    # 传播
    loss.backward()
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
       	
        w1.grad.zero_()
        w2.grad.zero_()