PyTorch - BatchNorm2d

最新推荐文章于 2024-06-09 09:44:25 发布

西笑生

最新推荐文章于 2024-06-09 09:44:25 发布

阅读量1.5k

点赞数 1

分类专栏： # PyTorch 文章标签： BatchNorm2d PyTorch 深度学习

本文链接：https://blog.csdn.net/flyfish1986/article/details/106626824

版权

PyTorch 专栏收录该内容

37 篇文章 1 订阅

订阅专栏

PyTorch - BatchNorm2d

flyfish

术语问题

在《深入浅出PyTorch》这本书中翻译成归一化
在花书《深度学习》书中翻译成标准化
在《深度学习之美》书中翻译成规范化
在《动手学深度学习》书中翻译成归一化
在《深度学习卷积神经网络从入门到精通》书中翻译成归一化
归一化，因为带了一字，容易被理解成将数据映射到[0,1], 而标准化有把数据映射到一个合理的分布的意思，翻译的不统一，容易造成讨论的概念不一致，可以参考

特征缩放（Feature_scaling）
我这里采用了标准化的翻译
PyTorch提供了
torch.nn.BatchNorm1d
torch.nn.BatchNorm2d
torch.nn.BatchNorm3d
这里以BatchNorm2d为例说明它的计算过程是怎样的

BatchNorm2d示例1:一个通道

import torch
import torch.nn as nn
input = torch.arange(0, 16).view(1,1,4,4).float()
# # tensor([[[[ 0.,  1.,  2.,  3.],
# #           [ 4.,  5.,  6.,  7.],
# #           [ 8.,  9., 10., 11.],
# #           [12., 13., 14., 15.]]]])

# With Learnable Parameters
m = nn.BatchNorm2d(1)
# Without Learnable Parameters
m = nn.BatchNorm2d(1, affine=False)
output = m(input)
print(output)

步骤
官网给的公式是
$\frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta$
我们把公式分解
$\mathbf{Input}: \mathcal{B}=\{x_1,\cdots,x_m\}$ ，为m个样本组成的mini-batch 。

第一步：计算mini-batch均值

$\mu_{\mathcal{B}} \leftarrow \frac{1}{m} \sum_{i=1}^m x_i$

第二步：计算mini-batch方差

$\sigma_{\mathcal{B}}^2 \leftarrow \frac{1}{m} \sum_{i=1}^m (x_i - \mu_{\mathcal{B}})^2$

第三步：标准化（normalize）

$\hat{x}_i \leftarrow \dfrac{x_i-\mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^2+\epsilon}}$

第四步：缩放和平移（scale and shift）

$y_i \leftarrow \gamma \hat{x}_i+\beta \equiv \mathrm{BN}_{\gamma, \beta}(x_i)$

因为上面是一个通道所以代码可以简写如下，这里是简写，后面的三个通道的例子是真正的计算过程，计算过程需要前三步

#第一步
first_mean= input.mean()#input的所有的数计算平均值 tensor(7.5000)
#第二步
second_variance=torch.pow(input-first_mean,exponent=2).mean()#tensor(21.2500)
#第三步
result=(input-first_mean)/(torch.sqrt(second_variance+1e-05))

输出结果都是

# tensor([[[[-1.6270, -1.4100, -1.1931, -0.9762],
#           [-0.7593, -0.5423, -0.3254, -0.1085],
#           [ 0.1085,  0.3254,  0.5423,  0.7593],
#           [ 0.9762,  1.1931,  1.4100,  1.6270]]]])

关于dim和keepdim的解释

import torch
x=torch.arange(24.0).view(2,3,4)
print(x.shape)#torch.Size([2, 3, 4])

#keepdim=True 仍然保持原来的大小，dim表示哪个维度变成1
#除了输出是1维，例如一维的10个数的平均数是一个数，结果是个标量。
print(torch.mean(x,dim=0,keepdim=True).shape)#torch.Size([1, 3, 4])
print(torch.mean(x,dim=1,keepdim=True).shape)#torch.Size([2, 1, 4])
print(torch.mean(x,dim=2,keepdim=True).shape)#torch.Size([2, 3, 1])

#keepdim=False是默认值，给dim赋值的维度会去除。
print(torch.mean(x,dim=0).shape)#torch.Size([3, 4])
print(torch.mean(x,dim=1).shape)#torch.Size([2, 4])
print(torch.mean(x,dim=2).shape)#torch.Size([2, 3])

像PyTorch这种（N, C, H, W）排列的维度是很直观的
从后往前看，
H和W就是二维的单通道图片
C个（H * W）当C=3时就是RGB通道的二维图片
N个 (C * H * W)就是N张这样的图片

BatchNorm2d示例2：三个通道的例子

import torch
import torch.nn as nn
input = torch.arange(0, 48).view(4,3,2,2).float()
# With Learnable Parameters
#m = nn.BatchNorm2d(3)
# Without Learnable Parameters
m = nn.BatchNorm2d(3, affine=False)
output = m(input)
print(output)

计算过程如下
去除维度 N,H,W，只留下C

import torch
input = torch.arange(0, 48).view(4,3,2,2).float()
eps=1e-5
mean = input.mean(dim=(0, 2, 3), keepdim=True)
var = torch.pow((input - mean) ,exponent= 2).mean(dim=(0, 2, 3), keepdim=True)
result = (input - mean) / torch.sqrt(var + eps)
print(result)

结果是两者输出结果相同

完整的参数说明

nn.BatchNorm2d(num_features)

应用批归一化即对小批量(mini-batch)数据组成的输入数据进行批标准化(Batch Normalization)操作，通过减少内部协变量偏移来加速深度网络训练。
1、num_features：
来自期望输入的特征数，该期望输入的大小为batch_size × num_features × height × width
输入输出相同）
输入Shape：（N, C, H, W）
输出Shape：（N, C, H, W）
2、eps：为保证数值稳定性（分母不能趋近或取0）,给分母加上的值。默认为1e-5。
3、momentum：用于running_mean和running_var计算的值。可以将累积移动平均线（cumulative moving average）（即简单平均线 simple average）设置为“无”。默认值：0.1
3、affine：一个布尔值，当设为true，给该层有可学习的仿射参数。
4、track_running_stats：一个布尔值，当设置为True时，此模块跟踪运行平均值和方差；设置为False时，此模块不跟踪此类统计信息，并且始终在训练和评估模式下使用批处理统计信息。默认值：True

单纯的一个实现

import torch
def pure_batch_norm(X, gamma, beta, eps = 1e-5):
    if len(X.shape) not in (2, 4):
        raise ValueError('only supports dense or 2dconv')

    # dense
    if len(X.shape) == 2:
        # mini-batch mean
        mean = torch.mean(X, axis=0)
        # mini-batch variance
        variance = torch.mean((X - mean) ** 2, axis=0)
        # normalize
        X_hat = (X - mean) * 1.0 / torch.sqrt(variance + eps)
        # scale and shift
        out = gamma * X_hat + beta

    # 2d conv
    elif len(X.shape) == 4:
        # extract the dimensions
        N, C, H, W = X.shape
        # mini-batch mean
        mean = torch.mean(X, axis=(0, 2, 3))
        # mini-batch variance
        variance = torch.mean((X - mean.reshape((1, C, 1, 1))) ** 2, axis=(0, 2, 3))
        # normalize
        X_hat = (X - mean.reshape((1, C, 1, 1))) * 1.0 / torch.sqrt(variance.reshape((1, C, 1, 1)) + eps)
        # scale and shift
        out = gamma.reshape((1, C, 1, 1)) * X_hat + beta.reshape((1, C, 1, 1))

    return out

import torch
import torch.nn as nn
input = torch.arange(0, 48).view(4,3,2,2).float()

m = nn.BatchNorm2d(3, affine=False)
output = m(input)
print(output)

result = pure_batch_norm(input,
    gamma = torch.tensor([1,1,1]), #gammar和beta维度要和channel数相同，上面是3
    beta=torch.tensor([0,0,0]))

print(result)

西笑生

关注

1
点赞
踩
7

收藏

觉得还不错? 一键收藏
打赏
0
评论
PyTorch - BatchNorm2d

PyTorch - BatchNorm2dflyfish术语问题在《深入浅出PyTorch》这本书中翻译成归一化在花书《深度学习》书中翻译成标准化在《深度学习之美》书中翻译成规范化在《动手学深度学习》书中翻译成归一化在《深度学习卷积神经网络从入门到精通》书中翻译成归一化归一化，因为带了一字，容易被理解成将数据映射到[0,1], 而标准化有把数据映射到一个合理的分布的意思，翻译的不统一，容易造成讨论的概念不一致，可以参考特征缩放（Feature_scaling）我这里采用了标准化的翻译
复制链接

扫一扫