torch.nn.BatchNorm1d和torch.nn.BatchNorm2d

在机器学习中,进行模型训练之前,需对数据做归一化处理,使其分布一致。在深度神经网络训练过程中,通常一次训练一个batch,而非全体数据。每个batch具有不同的分布,这样产生了internal covarivate shift问题——在训练过程中,数据分布会发生变化,对下一层网络的学习带来困难。Batch Normalization强行将数据拉回到均值为0,方差为1的正太分布上,一方面使得数据分布一致,另一方面避免梯度消失。可以加快网络训练的收敛速度。
参考:

 “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” 

PyTorch踩坑指南(1)nn.BatchNorm2d()函数_白水煮蝎子的博客-CSDN博客

<1>torch.nn.BatchNorm1d

CLASS torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)

Parameters:

  • num_features (int) – number of features or channels C of the input.样本的特征维数,一般输入数据的形式是矩阵$batch \times C$

  • eps (float) – a value added to the denominator for numerical stability. Default: 1e-5.为避免归一化时分母为0。

  • momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average). Default: 0.1 用来计算running_mean和running_var的一个量。

  • affine (bool) – a boolean value that when set to True, this module has learnable affine parameters. Default: True.用于设置是否具有可学习的仿射参数$\gamma $$\beta $$\gamma $的初始值为1,$\beta $的初始值为0。

  • track_running_stats (bool) – a boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics, and initializes statistics buffers running_mean and running_var as None. When these buffers are None, this module always uses batch statistics. in both training and eval modes. Default: True.用于设置是否更新统计值均值与方差,如果设置为True,则running_mean 的初始值为0,running_var 的初始值为1,如果设置为False,则初始化running_mean 和running_var 为None,如果为None则在训练和测试的过程中均使用测试数据batch 的统计值进行数据归一化。

Shape:

  • Input: (N, C), where N is the batch size, Cis the number of features or channels.输入数据的形式是矩阵$N \times C$

  • Output: (N, C) (same shape as input).

需要注意的是:仿射参数$\gamma $$\beta $是在反向传播学习得到,running_mean 和running_var是在正向传播中统计得到。

参考:BatchNorm1d — PyTorch 2.0 documentation

nn.BatchNorm1d_harry_tea的博客-CSDN博客

1.模型均值和方差的更新机制、数据归一化机制

需要注意的是 track_running_stats 的设置只在创建BatchNorm时有效,不在创建BatchNorm时设置不起效果,如下面案例:

import torch
m = torch.nn.BatchNorm1d(8,momentum=0.1, affine=True, track_running_stats = True) #track_running_stats只在初始化时设置有效
m.track_running_stats = False
print('running_mean:',m.running_mean) #初始值
print('running_var:',m.running_var )
print('track_running_stats:',m.track_running_stats )
m2 = torch.nn.BatchNorm1d(8,momentum=0.1, affine=True, track_running_stats = False) 
print('running_mean:',m2.running_mean) #初始值
print('running_var:',m2.running_var )
running_mean: tensor([0., 0., 0., 0., 0., 0., 0., 0.])
running_var: tensor([1., 1., 1., 1., 1., 1., 1., 1.])
track_running_stats: False
running_mean: None
running_var: None

running_mean 和running_var 模型参数是否需要更新,需要结合参数trainning和track_running_states来看,归一化的机制也因这两种参数的不同而不同。

(1)trainning = True,track_running_states = True:模型处于训练阶段,每作一次归一化,模型都需要更新参数running_mean和running_var,即跟踪每个batch数据的均值和方差。

参数更新方法:${x_{new}} = (1 - momentum)*{x_{old}} + momentum*{x_{obser}}$

其中,${x_{old}} $为模型的均值或方差,${x_{obser}} $为当前观测数据batch的均值或方差,${x_{new}} $为更新后的均值或方差,$momentum $为更新参数。

观测数据batch归一化方法:$y = {​{x - E[x]} \over {\sqrt {Var[x] + \varepsilon } }}*\gamma + \beta $

其中,$E[x] $为观测数据batch的均值,$Var[x] $为观测数据batch的方差,$y $为归一化之后的某个通道的数据,即归一化是用当前batch的均值和方差做归一化。

注意方差的无偏估计:${\sigma ^2} = {​{\sum\limits_{i = 0}^{N - 1} {​{​{({x_i} - E(x))}^2}} } \mathord{\left/ {\vphantom {​{\sum\limits_{i = 0}^{N - 1} {​{​{({x_i} - E(x))}^2}} } {(N - 1)}}} \right. \kern-\nulldelimiterspace} {(N - 1)}}$

方差的有偏估计:${\sigma ^2} = {​{\sum\limits_{i = 0}^{N - 1} {​{​{({x_i} - E(x))}^2}} } \mathord{\left/ {\vphantom {​{\sum\limits_{i = 0}^{N - 1} {​{​{({x_i} - E(x))}^2}} } {(N - 1)}}} \right. \kern-\nulldelimiterspace} {N }}$

需要注意的是:running_mean和running_var更新用的无偏的方差,数据归一化用的是有偏的方差。当数值$N$较大时,有偏估计和无偏估计基本一致。

import torch
m = torch.nn.BatchNorm1d(8,momentum=0.1, affine=True, track_running_stats = True) #track_running_stats只在初始化时设置有效
print('running_mean:',m.running_mean) #初始值
print('running_var:',m.running_var )
print('weight:',m.weight)
print('bias:',m.bias)

input = torch.randn(5, 8)
print('input:',input)
print('input[...,0]:',input[...,0])#第一列数据

obser_mean = torch.Tensor([input[...,i].mean() for i in range(8)])# 输入数据的均值
obser_var_unbiased = torch.Tensor([input[...,i].var() for i in range(8)])# 方差 无偏估计,相当于torch.var(input, dim=0, unbiased=True)
obser_var_biased = torch.Tensor([input[...,i].var(unbiased=False) for i in range(8)])# 方差 有偏估计,相当于torch.var(input, dim=0, unbiased=False) 
print('obser_mean:',obser_mean)
print('obser_var_unbiased:',obser_var_unbiased)
print('obser_var_biased:',obser_var_biased)

obser_running_mean = (1-m.momentum)*m.running_mean + m.momentum*obser_mean
obser_running_var = (1-m.momentum)*m.running_var + m.momentum*obser_var_unbiased
output = m(input)
output_obser = (input[...,0] - obser_mean[0])/(pow(obser_var_biased[0] + m.eps,0.5))#归一化
print('obser_running_mean:',obser_running_mean)
print('obser_running_var:',obser_running_var)
print('running_mean:',m.running_mean)
print('running_var:',m.running_var )
print('output[...,0]:',output[...,0])#归一化数据
print('output_obser:',output_obser)
running_mean: tensor([0., 0., 0., 0., 0., 0., 0., 0.])
running_var: tensor([1., 1., 1., 1., 1., 1., 1., 1.])
weight: Parameter containing:
tensor([1., 1., 1., 1., 1., 1., 1., 1.], requires_grad=True)
bias: Parameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)
input: tensor([[ 0.4901, -0.1794,  1.1301, -0.1901, -0.7794, -0.2863, -0.9673,  1.5712],
        [ 0.7150, -0.6555,  0.1724,  1.8487,  0.3064, -0.0863,  1.3970,  0.3117],
        [-1.4870, -1.0768, -0.8371,  1.7132,  0.9250,  0.6004, -0.2488,  0.8714],
        [-0.7459,  0.3344, -1.1203,  1.7061,  0.6755, -0.2490,  1.4969,  0.6247],
        [-0.2600,  2.0536,  0.5194, -1.4121, -1.3856, -0.2249,  0.0729, -0.4737]])
input[...,0]: tensor([ 0.4901,  0.7150, -1.4870, -0.7459, -0.2600])
obser_mean: tensor([-0.2576,  0.0953, -0.0271,  0.7332, -0.0516, -0.0492,  0.3501,  0.5810])
obser_var_unbiased: tensor([0.8138, 1.4763, 0.8822, 2.1516, 0.9799, 0.1376, 1.1456, 0.5629])
obser_var_biased: tensor([0.6510, 1.1810, 0.7058, 1.7213, 0.7839, 0.1101, 0.9165, 0.4503])
obser_running_mean: tensor([-0.0258,  0.0095, -0.0027,  0.0733, -0.0052, -0.0049,  0.0350,  0.0581])
obser_running_var: tensor([0.9814, 1.0476, 0.9882, 1.1152, 0.9980, 0.9138, 1.0146, 0.9563])
running_mean: tensor([-0.0258,  0.0095, -0.0027,  0.0733, -0.0052, -0.0049,  0.0350,  0.0581])
running_var: tensor([0.9814, 1.0476, 0.9882, 1.1152, 0.9980, 0.9138, 1.0146, 0.9563])
output[...,0]: tensor([ 0.9267,  1.2054, -1.5237, -0.6053, -0.0031],
       grad_fn=<SelectBackward0>)
output_obser: tensor([ 0.9267,  1.2054, -1.5237, -0.6053, -0.0031])

(2)trainning = False,track_running_states = True:模型处于测试阶段,利用模型存储的均值和方差作归一化,但不更新模型的均值和方差。

# 测试阶段
m.eval()
print(m.training)
print(m.track_running_stats)
input = torch.randn(5, 8)

obser_mean = torch.mean(input, dim=0)# 输入数据的均值
obser_var_biased = torch.var(input, dim=0, unbiased=False) # 方差 有偏估计
print('obser_mean:',obser_mean)
print('obser_var_biased:',obser_var_biased)
print('running_mean:',m.running_mean)
print('running_var:',m.running_var )

output = m(input)
output_obser = (input[...,0] - obser_mean[0])/(pow(obser_var_biased[0] + m.eps,0.5))#归一化
output2_obser = (input[...,0] - m.running_mean[0])/(pow(m.running_var[0] + m.eps,0.5))
print('output[...,0]:',output[...,0])
print('output_obser:',output_obser)
print('output2_obser:',output2_obser)
False
True
obser_mean: tensor([-0.1944, -0.0978, -0.2910,  0.1413, -0.3192,  0.2840,  0.2374,  0.0797])
obser_var_biased: tensor([2.0099, 0.1910, 0.8365, 0.2942, 1.6348, 1.7859, 1.5220, 0.6940])
running_mean: tensor([ 0.0830,  0.0147,  0.0465,  0.0591,  0.0311,  0.0579, -0.0096,  0.0033])
running_var: tensor([1.0621, 0.9541, 0.9964, 1.1851, 0.9764, 1.0526, 1.0279, 0.9809])
output[...,0]: tensor([ 0.4747, -2.2079, -1.2251,  1.7868, -0.1744],
       grad_fn=<SelectBackward0>)
output_obser: tensor([ 0.5407, -1.4093, -0.6949,  1.4946,  0.0689])
output2_obser: tensor([ 0.4747, -2.2079, -1.2251,  1.7868, -0.1744])

(3)trainning = True或False,track_running_states = False:模型无论处于训练或测试阶段,都是利用当前batch的均值和方差做归一化,且不更新模型的均值和方差。

import torch
m = torch.nn.BatchNorm1d(8,momentum=0.1, affine=True, track_running_stats = False) #track_running_stats只在初始化时设置有效
print('running_mean:',m.running_mean) #初始值
print('running_var:',m.running_var )

input = torch.randn(5, 8)
obser_mean = torch.Tensor([input[...,i].mean() for i in range(8)])# 输入数据的均值
obser_var_biased = torch.Tensor([input[...,i].var(unbiased=False) for i in range(8)])# 方差 有偏估计
print('obser_mean:',obser_mean)
print('obser_var_biased:',obser_var_biased)
output = m(input)
output_obser = (input[...,0] - obser_mean[0])/(pow(obser_var_biased[0] + m.eps,0.5))#归一化
print('running_mean:',m.running_mean)
print('running_var:',m.running_var )
print('output[...,0]:',output[...,0])#归一化数据
print('output_obser:',output_obser)


# 测试阶段
m.eval()
print(m.training)
print(m.track_running_stats)
input = torch.randn(5, 8)

obser_mean = torch.mean(input, dim=0)# 输入数据的均值
obser_var_biased = torch.var(input, dim=0, unbiased=False) # 方差 有偏估计
print('obser_mean:',obser_mean)
print('obser_var_biased:',obser_var_biased)
print('running_mean:',m.running_mean)
print('running_var:',m.running_var )

output = m(input)
output_obser = (input[...,0] - obser_mean[0])/(pow(obser_var_biased[0] + m.eps,0.5))#归一化
print('output[...,0]:',output[...,0])
print('output_obser:',output_obser)
running_mean: None
running_var: None
obser_mean: tensor([-0.4277,  0.2008, -0.3871,  0.4741,  0.5016,  0.6817, -0.2613,  0.0763])
obser_var_biased: tensor([0.3961, 0.7895, 1.1211, 0.2614, 0.2954, 0.4563, 1.8461, 0.7862])
running_mean: None
running_var: None
output[...,0]: tensor([ 0.6016,  1.0995, -1.0294,  0.6994, -1.3712],
       grad_fn=<SelectBackward0>)
output_obser: tensor([ 0.6016,  1.0995, -1.0294,  0.6994, -1.3712])
False
False
obser_mean: tensor([-0.7911, -0.0979,  0.5710, -0.8198,  0.3552, -0.0772,  0.7881,  0.7573])
obser_var_biased: tensor([2.0702, 1.2274, 1.2483, 0.5527, 0.3471, 0.2689, 1.0752, 0.7770])
running_mean: None
running_var: None
output[...,0]: tensor([ 0.1504,  0.6443, -0.4143,  1.2792, -1.6596],
       grad_fn=<SelectBackward0>)
output_obser: tensor([ 0.1504,  0.6443, -0.4143,  1.2792, -1.6596])

<2>torch.nn.BatchNorm2d

CLASS torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)

Parameters:

  • num_features (int) – Cfrom an expected input of size (N, C, H, W)

Shape:

  • Input: (N, C, H, W)

  • Output: (N, C, H, W) (same shape as input)

BatchNorm2d和BatchNorm1d不同的地方在于输入数据的不同。在计算观测数据batch的均值和方差的时候是对batch的同一通道上的所有元素进行统计计算,假设输入数据为$(N, C, H, W)$,则有:

每个通道的均值:${\mu _c} = {​{\sum\limits_{n = 0}^{N - 1} {\sum\limits_{h = 0}^{H - 1} {\sum\limits_{w = 0}^{W - 1} {x[n,c,w,h]} } } } \over {N \times H \times W}},c = 0, \cdots ,C - 1$

每个通道的方差:$\sigma _c^2 = {​{\sum\limits_{n = 0}^{N - 1} {\sum\limits_{h = 0}^{H - 1} {\sum\limits_{w = 0}^{W - 1} {​{​{(x[n,c,w,h] - {\mu _c})}^2}} } } } \over {N \times H \times W}},c = 0, \cdots ,C - 1$

对通道$c$的每个元素${x[n,c,w,h]}$作归一化:$\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\frown$}}\over x} = {​{x - {\mu _c}} \over {\sqrt {\sigma _c^2 + \varepsilon } }}*\gamma + \beta $

BatchNorm2d的模型均值和方差的更新机制,以及数据归一化机制跟BatchNorm1d一样。

import torch
m = torch.nn.BatchNorm2d(3, eps=0, momentum=0.5, affine=True, track_running_stats=True)
print('running_mean:',m.running_mean) #初始值
print('running_var:',m.running_var )
print('weight:',m.weight)
print('bias:',m.bias)

input = torch.randn(1, 3, 5, 5)
print('input[0][0]:',input[0][0])#第一个batch,第一个通道

obser_mean = torch.Tensor([input[0][i].mean() for i in range(3)])# 输入数据的均值
obser_var_unbiased = torch.Tensor([input[0][i].var() for i in range(3)])# 方差 无偏估计,相当于torch.var(input[0][0], unbiased=True)
obser_var_biased = torch.Tensor([input[0][i].var(unbiased=False) for i in range(3)])# 方差 有偏估计,相当于torch.var(input[0][0], unbiased=False)
print('obser_mean:',obser_mean)
print('obser_var_unbiased:',obser_var_unbiased)
print('obser_var_biased:',obser_var_biased)
obser_running_mean = (1-m.momentum)*m.running_mean + m.momentum*obser_mean
obser_running_var = (1-m.momentum)*m.running_var + m.momentum*obser_var_unbiased
output = m(input)
print('obser_running_mean:',obser_running_mean)
print('obser_running_var:',obser_running_var)
print('running_mean:',m.running_mean)
print('running_var:',m.running_var )
output_obser = (input[0][0] - obser_mean[0])/(pow(obser_var_biased[0] + m.eps,0.5))#归一化
print('output[0][0]:',output[0][0])#归一化数据
print('output_obser:',output_obser)
running_mean: tensor([0., 0., 0.])
running_var: tensor([1., 1., 1.])
weight: Parameter containing:
tensor([1., 1., 1.], requires_grad=True)
bias: Parameter containing:
tensor([0., 0., 0.], requires_grad=True)
input[0][0]: tensor([[-0.3550, -1.1596, -0.4947, -0.8188, -0.1722],
        [-1.3371,  0.9375,  0.5564,  2.3561, -0.5711],
        [ 0.3932,  2.6657,  0.3440, -0.9300,  0.1791],
        [-1.0307,  0.2115,  0.4953,  1.8088,  0.0496],
        [-1.0584,  0.4566, -0.1415,  1.2106,  0.4498]])
obser_mean: tensor([ 0.1618,  0.2137, -0.0836])
obser_var_unbiased: tensor([1.1056, 1.1707, 0.9300])
obser_var_biased: tensor([1.0613, 1.1239, 0.8928])
obser_running_mean: tensor([ 0.0809,  0.1068, -0.0418])
obser_running_var: tensor([1.0528, 1.0853, 0.9650])
running_mean: tensor([ 0.0809,  0.1068, -0.0418])
running_var: tensor([1.0528, 1.0853, 0.9650])
output[0][0]: tensor([[-0.5016, -1.2827, -0.6372, -0.9518, -0.3242],
        [-1.4549,  0.7529,  0.3830,  2.1300, -0.7114],
        [ 0.2246,  2.4305,  0.1768, -1.0598,  0.0168],
        [-1.1575,  0.0482,  0.3237,  1.5987, -0.1089],
        [-1.1844,  0.2861, -0.2944,  1.0180,  0.2795]],
       grad_fn=<SelectBackward0>)
output_obser: tensor([[-0.5016, -1.2827, -0.6372, -0.9518, -0.3242],
        [-1.4549,  0.7529,  0.3830,  2.1300, -0.7114],
        [ 0.2246,  2.4305,  0.1768, -1.0598,  0.0168],
        [-1.1575,  0.0482,  0.3237,  1.5987, -0.1089],
        [-1.1844,  0.2861, -0.2944,  1.0180,  0.2795]])

参考:BatchNorm2d — PyTorch 2.0 documentation

详细解读nn.BatchNorm2d——批量标准化操作_ChaoFeiLi的博客-CSDN博客

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
`torch.nn.BatchNorm1d`是PyTorch中的一个模块,用于实现一维批量归一化(Batch Normalization)。一维批量归一化是一种常用的正则化技术,用于加速深度神经网络的训练并提高模型的性能。 在深度神经网络中,输入数据经过每一层的线性变换和非线性激活函数后,可能会导致输入数据分布的偏移和缩放。这种分布的不稳定性会增加训练的困难,并且在网络深度增加时尤为明显。批量归一化通过对每个批次的数据进行归一化,使得每个特征维度的均值为0,方差为1,从而减轻了内部协变量偏移问题。 `torch.nn.BatchNorm1d`的作用是对输入的一维数据进行批量归一化,它可以被应用于具有1维输入特征的各种神经网络层。它通过估计每个特征维度上的均值和标准差来对输入进行归一化,并应用可学习的缩放参数和平移参数来保持数据的表达能力。 在使用`torch.nn.BatchNorm1d`时,你需要指定输入数据的特征维度,并可以选择是否设置`affine`参数为True,以便学习可学习的缩放参数和平移参数。另外,你还可以设置`momentum`参数来控制用于计算均值和方差的指数平均值的动量。 下面是一个使用`torch.nn.BatchNorm1d`的简单示例: ```python import torch import torch.nn as nn # 创建一维输入数据 input_data = torch.randn(10, 20) # 创建Batch Normalization层 bn = nn.BatchNorm1d(20) # 对输入数据进行批量归一化 output = bn(input_data) ``` 在上面的示例中,输入数据`input_data`的维度是`(10, 20)`,然后通过`nn.BatchNorm1d(20)`创建了一个`torch.nn.BatchNorm1d`的实例。最后,将输入数据传入该实例中,得到归一化后的输出数据`output`。 希望能对你有所帮助!如有更多问题,请继续提问。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值