Pytorch - Autograd

最新推荐文章于 2022-08-21 15:23:11 发布

Life Is Beautiful

最新推荐文章于 2022-08-21 15:23:11 发布

阅读量195

点赞数

分类专栏： Pytorch

本文链接：https://blog.csdn.net/lib0000/article/details/116231177

版权

Pytorch 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

文章目录

Auto-grad

Auto-grad

Internally, autograd represents this graph as a graph of Function objects, which can be apply() to compute the result of evaluating the graph. The .grad_fn attribute of each torch.Tensor is the entry point into this graph.

Concurrency Training on CPU

The gradient can be non-deterministic here!

# Define a train function to be used in different threads
def train_fn():
    x = torch.ones(5, 5, requires_grad=True)
    # forward
    y = (x + 3) * (x + 4) * 0.5
    # backward
    y.sum().backward()
    # potential optimizer update


# User write their own threading code to drive the train_fn
threads = []
for _ in range(10):
    p = threading.Thread(target=train_fn, args=())
    p.start()
    threads.append(p)

for p in threads:
    p.join()

optimizer 的 zero_grad 只需要运行一次即可

Custom Layers and Backward

自定义操作，详情参考官方 API torch.autograd.Function 与 torch.autograd.function._ContextMethodMixin (ctx)

Example 1 : Binarized Input and Use our own difined backward.

class BinarizedF(Function):
    @staticmethod
    def forward(ctx, input_):  # 注意这里 ctx  的使用
        ctx.save_for_backward(input_) # 将前向传播的数据保存下来，供反向传播时使用
        a = torch.ones_like(input_)
        b = -torch.ones_like(input_)
        output = torch.where(input_ >= 0, a, b)
        return output

    @staticmethod
    def backward(ctx, output_grad):
        input_, = ctx.saved_tensors
        input_abs = torch.abs(input_)
        ones = torch.ones_like(input_)
        zeros = torch.zeros_like(input_)
        input_grad = torch.where(input_abs <= 1, ones, zeros)
        return input_grad

class Binarized(torch.nn.Module):
    def __init__(self):
        super(Binarized, self).__init__()

    def forward(self, input_):
        return BinarizedF.apply(input_)

Example 2 : Custom Linear Layer

# Inherit from Function
class LinearFunction(Function):

    # Note that both forward and backward are @staticmethods
    @staticmethod
    # bias is an optional argument
    def forward(ctx, input, weight, bias=None):
        ctx.save_for_backward(input, weight, bias)
        output = input.mm(weight.t())
        if bias is not None:
            output += bias.unsqueeze(0).expand_as(output)
        return output

    # This function has only a single output, so it gets only one gradient
    @staticmethod
    def backward(ctx, grad_output):
        # This is a pattern that is very convenient - at the top of backward
        # unpack saved_tensors and initialize all gradients w.r.t. inputs to
        # None. Thanks to the fact that additional trailing Nones are
        # ignored, the return statement is simple even when the function has
        # optional inputs.
        input, weight, bias = ctx.saved_tensors
        grad_input = grad_weight = grad_bias = None

        # These needs_input_grad checks are optional and there only to
        # improve efficiency. If you want to make your code simpler, you can
        # skip them. Returning gradients for inputs that don't require it is
        # not an error.
        if ctx.needs_input_grad[0]:
            grad_input = grad_output.mm(weight)
        if ctx.needs_input_grad[1]:
            grad_weight = grad_output.t().mm(input)
        if bias is not None and ctx.needs_input_grad[2]:
            grad_bias = grad_output.sum(0)

        # 这里应该就对应到相关 Tensor 的 .grad 属性中
        return grad_input, grad_weight, grad_bias

class Linear(nn.Module):
    def __init__(self, input_features, output_features, bias=True):
        super(Linear, self).__init__()
        self.input_features = input_features
        self.output_features = output_features

        # nn.Parameter is a special kind of Tensor, that will get
        # automatically registered as Module's parameter once it's assigned
        # as an attribute. Parameters and buffers need to be registered, or
        # they won't appear in .parameters() (doesn't apply to buffers), and
        # won't be converted when e.g. .cuda() is called. You can use
        # .register_buffer() to register buffers.
        # nn.Parameters require gradients by default.
        self.weight = nn.Parameter(torch.Tensor(output_features, input_features))
        if bias:
            self.bias = nn.Parameter(torch.Tensor(output_features))
        else:
            # You should always register all possible parameters, but the
            # optional ones can be None if you want.
            self.register_parameter('bias', None)

        # Not a very smart way to initialize weights
        self.weight.data.uniform_(-0.1, 0.1)
        if self.bias is not None:
            self.bias.data.uniform_(-0.1, 0.1)

    def forward(self, input):
        # See the autograd section for explanation of what happens here.
        return LinearFunction.apply(input, self.weight, self.bias)

自定义处理层，实现backward 处理。从auto_grad 中继承 Function 实现 forward & backward。再继承 nn.Module，实现 forward，就与一般的神经网络层一致了。

在各层中监控输入输出（hook 机制）

def printnorm(self, input, output):
    # input is a tuple of packed tensor
    # output is a Tensor, whose data is the interest
    print('input size : ', input[0].size()) # input[0] is the input tensor

net.conv2.register_forward_hook(printnorm) # when call forward, the function will be called

def printgrad(self, grad_input, grad_output):
    # input and output are tuples
    print('grad_input size : ', grad_input[0].size())
    print('grad_output size : ', grad_output[0].norm())

net.conv2.register_backward_hook(printgrad)

Transfer Learning & Fine-tuning

训练时，有时会只训练一部分参数（分阶段训练），关于 requires_grad官方文档

If there’s a single input to an operation that requires gradient, its output will also require gradient. Conversely, only if all inputs don’t require gradient, the output also won’t require it.

设置 requires_grad 仅是减少计算。是否更新参数是在 Optimizer 中。

Pay attention to how to freeze paramters inside a network. Only add the parameters to be updated into the optimizer.

model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
	param.requires_grad = False
model.fc = nn.Linear(512,100)
# optimize only the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=0.02)

Batch Accumulation

Pytorch backward 之后，各个 Tensor 的梯度存在 .grad 中。当反复构建计算图并且反向传播之后，各个梯度是累加的。所以可以直接经过几个 batch 反向传播后，再用 Optimizer 优化。

def zero_grad(self):
    r"""Clears the gradients of all optimized :class:`torch.Tensor` s."""
    for group in self.param_groups:
        for p in group['params']:
            if p.grad is not None:
                p.grad.detach_()
                p.grad.zero_()

梯度操作

Gradient Penalty

在 StyleGAN 中提出的，见论文。是将参数的梯度也作为损失项加入总 Loss 中。

$\frac{\gamma}{2}E_{PD(x)}[||\triangledown D_{\phi}(x)||^2]$

在 Pytorch 中可使用 torch.autograd.grad 计算梯度。也就是 $\triangledown D_{\phi}(x)$ 。eg.

if r1_gamma != 0.0:
    real_loss = d_result_real.sum()
    real_grads = torch.autograd.grad(real_loss, reals, create_graph=True, retain_graph=True)[0]
    r1_penalty = torch.sum(real_grads.pow(2.0), dim=[1, 2, 3])
    loss = loss + r1_penalty * (r1_gamma * 0.5)
return loss.mean()

real_loss 是 $D_{\phi}(x)$ ，也就是 back propogation 时的损失。 $x$ 是输入，也就是图片。

Gradient Clip

梯度剪裁（Gradient Clipping），为了防止梯度过大导致模型训练不收敛，或者难以收敛。采用 torch.nn.utils.clip_grad_norm_，API。注意这是 In-place 操作。

torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

第一个传入的可以是 Tensor 的可迭代器（列表），或者单一的 Tensor

Notes

尽量减少对矩阵的直接赋值(其实赋值的话根本没办法算梯度)，最好是生成一个常数矩阵，与原矩阵做加减乘除运算，或者将不同的矩阵赋值到此矩阵中达到目的；pytorch 反向传播支持赋值操作。

Life Is Beautiful

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Pytorch - Autograd

文章目录Auto-gradConcurrency Training on CPUCustom Layers and BackwardTransfer Learning & Fine-tuningBatch AccumulationNotesAuto-gradInternally, autograd represents this graph as a graph of Function objects, which can be apply() to compute the result of
复制链接

扫一扫