深度学习－批处理层

最新推荐文章于 2024-07-22 17:57:26 发布

weixin_38498942

最新推荐文章于 2024-07-22 17:57:26 发布

阅读量1.4k

点赞数

分类专栏： sdk

本文链接：https://blog.csdn.net/weixin_38498942/article/details/108118542

版权

sdk 专栏收录该内容

281 篇文章 30 订阅

订阅专栏

在本章中，我们将学习批处理规范层。之前我们说过，特征缩放使梯度下降的工作更加容易。现在，我们将扩展这个想法，并在训练过程中规范每个完全连接层或卷积层的激活。这也意味着在训练时，我们将选择一个批次来计算其均值和标准差。
你可以认为批处理规范将是某种具有可训练参数的自适应（或可学习）预处理块。这也意味着我们需要反向传播它们。
以下是使用批处理规范的优势列表：
1、改进了梯度流，可用于非常深的模型（Resnet需要此功能）
2、允许更高的学习率
3、减少对初始化的依赖
4、提供某种正则化（甚至使Dropout的重要性降低，但继续使用它）
5、根据经验，如果使用Dropout + BatchNorm，则不需要L2正则化
它基本上会强制您的激活（Conv，FC输出）为单位标准偏差和零均值。对于每个学习的数据批次，我们应用以下归一化：
在这里插入图片描述

操作如下：
在这里插入图片描述

批处理范本层的使用位置
批处理标准层在线性层（即FC，conv）之后和非线性层（relu）之前使用。
实际上有2个批处理规范实现，一个用于FC层，另一个用于转换层（空间批处理规范）。好消息是，空间批处理规范在经过一些调整后才调用正常的批处理规范。
在这里插入图片描述

测试时间
在预测时，批处理规范的工作方式有所不同。
均值/标准差不是基于批次计算的。相反，我们需要在训练模型中每个批次规范层的整个数据集（种群）的均值/标准差的过程中建立一个估计。

反向传播
如前所述，我们需要知道如何在批处理规范层上反向传播，首先，与其他层一样，我们需要创建计算图。完成此步骤后，我们需要计算每个节点相对于其输入的导数。
计算图
为了找到反向传播的偏导数，最好将算法可视化为计算图：
在这里插入图片描述

空间批处理规范
如前所述，在CONV和Relu层之间使用空间批处理范数。为了实现空间批处理规范，我们只需要调用正常的批处理规范，但是输入会被重塑和排列。在下面，我们介绍了空间批量范数的正向和反向传播的matlab版本：

% It’s just a call to the normal batchnorm but with some
% permute/reshape on the input signal
function [activations] = ForwardPropagation(obj, input, weights, bias)
obj.previousInput = input;
[H,W,C,N] = size(input);

% Permute the dimensions to the following format
% (cols, channel, rows, batch)
% On python was: x.transpose((0,2,3,1))
% Python tensor format:
% (batch(0), channel(1), rows(2), cols(3))
% Matlab tensor format:
% (rows(1), cols(2), channel(3), batch(4))
inputTransposed = permute(input,[2,3,1,4]);

% Flat the input (On python the reshape is row-major)
inputFlat = reshape_row_major(inputTransposed,[(numel(inputTransposed) / C),C]);

% Call the forward propagation of normal batchnorm
activations = obj.normalBatchNorm.ForwardPropagation(inputFlat, weights, bias);

% Reshape/transpose back the signal, on python was (N,H,W,C)
activations_reshape = reshape_row_major(activations, [W,C,H,N]);
% On python was transpose(0,3,1,2)
activations = permute(activations_reshape,[3 1 2 4]);

% Store stuff for backpropagation
obj.activations = activations;
obj.weights = weights;
obj.biases = bias;
end

现在，对于反向传播，我们只需重塑并重新排列：
function [gradient] = BackwardPropagation(obj, dout)
% Observe that we use the same reshape/permutes from forward
% propagation
dout = dout.input;
[H,W,C,N] = size(dout);
% On python was: x.transpose((0,2,3,1))
dout_transp = permute(dout,[2,3,1,4]);

% Flat the input
dout_flat = reshape_row_major(dout_transp,[(numel(dout_transp) / C),C]);

% Call the backward propagation of normal batchnorm
gradDout.input = dout_flat;
gradient = obj.normalBatchNorm.BackwardPropagation(gradDout);

% Reshape/transpose back the signal, on python was (N,H,W,C)
gradient.input = reshape_row_major(gradient.input, [W,C,H,N]);
% On python was transpose(0,3,1,2)
gradient.input = permute(gradient.input,[3 1 2 4]);

end