【笔记】BN（Batch Normalization）、正则化:BN是在同一批次中逐个的计算同一channel的均值，方差，来进行归一化；正则化作用是损失精度去调整样本的不足产生的拟合

本文链接：https://blog.csdn.net/nyist_yangguang/article/details/118696864

本文探讨了批量归一化（BN）在深度学习模型中的作用，指出其可以作为正则化手段，部分替代Dropout。BN层在每个批次的数据上进行操作，调整输入分布，从而稳定训练过程。示例展示了BN层在张量上的应用，并通过实例解释了BN如何影响输出。同时，文章强调了带着批判眼光看待技术的重要性。

摘要由CSDN通过智能技术生成

还是希望大家带着批判的眼光看问题，有的人说L0不是范数。

BN：其能一定程度的起到正则化作用，几乎代替Dropout

注意可训练参数 $\gamma \ \beta$ 的维度等于批次内每一个张量的 channels。如果是R、G、B图像，维度就是3。也就是每一个channel都需要一个 $\gamma \ \beta$ ，所以它俩各是1 X 3的列向量。

summary
OrderedDict([('Conv2d-1',
              OrderedDict([('input_shape', [-1, 3, 418, 418]),
                           ('output_shape', [-1, 64, 209, 209]),
                           ('trainable', True),
                           ('nb_params', tensor(9408))])),
             ('BatchNorm2d-2',
              OrderedDict([('input_shape', [-1, 64, 209, 209]),
                           ('output_shape', [-1, 64, 209, 209]),
                           ('trainable', True)]))])
len(summary)
2

module.weight.requires_grad
True

module.weight
Parameter containing:
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:0',
       requires_grad=True)

module.weight.size()
torch.Size([64])

module.bias.size()
torch.Size([64])



module.bias 
Parameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       device='cuda:0', requires_grad=True)

module.bias.size()
torch.Size([64])

eg1:

eg2:

import torch.nn as nn
import torch

a = torch.tensor([[
                      [[1., 2]],
                      [[2, 1]],
                      [[2, 2]],
                  ],
                  [
                      [[2., 2]],
                      [[3, 1]],
                      [[1, 2]],
                  ]

                  ])
print(a, type(a), a.dtype)
print(a.shape)
b = nn.BatchNorm2d(3)
d = b(a)
print(d)








tensor([[[[1., 2.]],

         [[2., 1.]],

         [[2., 2.]]],


        [[[2., 2.]],

         [[3., 1.]],

         [[1., 2.]]]]) <class 'torch.Tensor'> torch.float32
torch.Size([2, 3, 1, 2])
tensor([[[[-1.7320,  0.5773]],

         [[ 0.3015, -0.9045]],

         [[ 0.5773,  0.5773]]],


        [[[ 0.5773,  0.5773]],

         [[ 1.5075, -0.9045]],

         [[-1.7320,  0.5773]]]], grad_fn=<NativeBatchNormBackward>)