如何在pytorch中使用自定义的激活函数?
如果自定义的激活函数是可导的,那么可以直接写一个python function来定义并调用,因为pytorch的autograd会自动对其求导。
如果自定义的激活函数不是可导的,比如类似于ReLU的分段可导的函数,需要写一个继承torch.autograd.Function的类,并自行定义forward和backward的过程。
在pytorch中提供了定义新的autograd function的tutorial: https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_custom_function.html, tutorial以ReLU为例介绍了在forward, backward中需要自行定义的内容。
1 importtorch2
3
4 classMyReLU(torch.autograd.Function):5 """
6 We can implement our own custom autograd Functions by subclassing7 torch.autograd.Function and implementing the forward and backward passes8 which operate on Tensors.9 """
10
11 @staticmethod12 defforward(ctx, input):13 """
14 In the forward pass we receive a Tensor containing the input and return15 a Tensor containing the output. ctx is a context object that can be used16 to stash information for backward computation. You can cache arbitrary17 objects for use in the backward pass using the ctx.save_for_backward method.18 """
19 ctx.save_for_backward(input)20 return input.clamp(min=0)21
22 @staticmethod23 defbackward(ctx, grad_output):24 """
25 In the backward pass we receive a Tensor containing the gradient of the loss26 with respect to the output, and we need to compute the gradient of the loss27 with respect to the input.28 """
29 input, =ctx.saved_tensors30 grad_input =grad_output.clone()31 grad_input[input < 0] =032 returngrad_input33
34
35 dtype =torch.float36 device = torch.device("cpu")37 #device = torch.device("cuda:0") # Uncomment this to run on GPU
38
39 #N is batch size; D_in is input dimension;
40 #H is hidden dimension; D_out is output dimension.
41 N, D_in, H, D_out = 64, 1000, 100, 10
42
43 #Create random Tensors to hold input and outputs.
44 x = torch.randn(N, D_in, device=device, dtype=dtype)45 y = torch.randn(N, D_out, device=device, dtype=dtype)46
47 #Create random Tensors for weights.
48 w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)49 w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)50
51 learning_rate = 1e-6
52 for t in range(500):53 #To apply our Function, we use Function.apply method. We alias this as 'relu'.
54 relu =MyReLU.apply55
56 #Forward pass: compute predicted y using operations; we compute
57 #ReLU using our custom autograd operation.
58 y_pred =relu(x.mm(w1)).mm(w2)59
60 #Compute and print loss
61 loss = (y_pred - y).pow(2).sum()62 print(t, loss.item())63
64 #Use autograd to compute the backward pass.
65 loss.backward()66
67 #Update weights using gradient descent
68 with torch.no_grad():69 w1 -= learning_rate *w1.grad70 w2 -= learning_rate *w2.grad71
72 #Manually zero the gradients after updating weights
73 w1.grad.zero_()74 w2.grad.zero_()
但是如果定义ReLU函数时,没有使用以上正确