PyTorch学习笔记(28) BN LN IN GN

Why Normalization

Internal Covariate Shift(ICS):数据尺度/分布异常,导致训练困难

H 11 = ∑ i = 0 n X i ∗ W 1 i D ( H 11 ) = ∑ i = 0 n D ( X i ) ∗ D ( W 1 i ) = n ∗ ( 1 ∗ 1 ) = n \begin{aligned} \mathrm{H}_{11}=& \sum_{i=0}^{n} X_{i} * W_{1 i} \\ \mathrm{D}\left(\mathrm{H}_{11}\right) &=\sum_{i=0}^{n} D\left(X_{i}\right) * D\left(W_{1 i}\right) \\ &=n *(1 * 1) \\ &=n \end{aligned} H11=D(H11)i=0nXiW1i=i=0nD(Xi)D(W1i)=n(11)=n
std ⁡ ( H 11 ) = D ( H 11 ) = n D ( H 1 ) = n ∗ D ( X ) ∗ D ( W ) = 1 \begin{array}{l} \operatorname{std}\left(\mathrm{H}_{11}\right)=\sqrt{\mathbf{D}\left(\mathrm{H}_{11}\right)}=\sqrt{n} \\ \mathbf{D}\left(\mathrm{H}_{1}\right)=\boldsymbol{n} * \boldsymbol{D}(\boldsymbol{X}) * \boldsymbol{D}(\boldsymbol{W})=\mathbf{1} \end{array} std(H11)=D(H11) =n D(H1)=nD(X)D(W)=1
D ( W ) = 1 n ⇒ std ⁡ ( W ) = 1 n D(W)=\frac{1}{n} \Rightarrow \operatorname{std}(W)=\sqrt{\frac{1}{n}} D(W)=n1std(W)=n1

常见的Normalization

1.Batch Normalization(BN)
2.Layer Normalization(LN)
3.Instance Normalization(IN)
4.Group Normalization(GN)

相同点

x ^ i ← x i − μ B σ B 2 + ϵ \widehat{x}_{i} \leftarrow \frac{x_{i}-\mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^{2}+\epsilon}} x iσB2+ϵ xiμB
$$
y_{i} \leftarrow \gamma \widehat{x}{i}+\beta \equiv \mathrm{N}{\gamma, \beta}\left(x_{i}\right)

不同点

均值和方差求取方式

1.Layer Normalization

起因:BN不适合用于变长的网络,如RNN
思路:逐层计算均值和方差

注意事项:

1.不再有running_mean 和 running_var
2.gamma 和 beta 为逐元素、逐特征的

nn.LayerNorm

主要参数:
normalized_shape:该层特征形状
eps:分母修正项
elementwise_affine:是否需要affine transform

2.Instance Normalization

起因:BN在图像生成(Image Ganeration)中不适用
思路:==逐Instance(channel)==计算均值和方差
计算方式 逐通道的

nn.InstanceNorm

主要参数:
num_features:一个样本特征数量(最重要)
eps:分母修正项
momentum:指数加权平均估计当前mean/var
affine:是否需要affine transform
track_running_stats:是训练状态,还是测试状态

3.Group Normalization

起因:小batch样本中,BN估计的值不准
思路:数据不够,通道来凑

注意事项

1.不再有running_mean和running_var
2.gamma 和beta 为逐通道(channel)的

应用场景 大模型(小batch size)任务

nn.GroupNorm

主要参数
num_groups 分组数 通产设为2的n次方
num_channels 通道数(特征数)
eps 分母修正项
affine 是否需要affine transform

小结:

BN LN IN GN 都是为了克服Internal Covariate shift(ICS)

加减乘除

减均值 除标准差 乘γ 加β



# -*- coding: utf-8 -*-

import torch
import numpy as np
import torch.nn as nn
from tools.common_tools import set_seed


set_seed(1)  # 设置随机种子

# ======================================== nn.layer norm
# flag = 1
flag = 0
if flag:
    batch_size = 2
    num_features = 3

    features_shape = (2,2)
    # features_shape = (3, 4)

    feature_map = torch.ones(features_shape)  # 2D
    feature_maps = torch.stack([feature_map * (i + 1) for i in range(num_features)], dim=0)  # 3D
    feature_maps_bs = torch.stack([feature_maps for i in range(batch_size)], dim=0)  # 4D

    # feature_maps_bs shape is [8, 6, 3, 4],  B * C * H * W
    ln = nn.LayerNorm(feature_maps_bs.size()[1:], elementwise_affine=True)
    # ln = nn.LayerNorm(feature_maps_bs.size()[1:], elementwise_affine=False)
    # ln = nn.LayerNorm([6, 3, 4])
    # ln = nn.LayerNorm([6, 3])

    output = ln(feature_maps_bs)

    print("Layer Normalization")
    print(ln.weight.shape)
    print(feature_maps_bs[0, ...])
    print(output[0, ...])

# ======================================== nn.instance norm 2d
# flag = 1
flag = 0
if flag:

    batch_size = 3
    num_features = 3
    momentum = 0.3

    features_shape = (2, 2)

    feature_map = torch.ones(features_shape)    # 2D
    feature_maps = torch.stack([feature_map * (i + 1) for i in range(num_features)], dim=0)  # 3D
    feature_maps_bs = torch.stack([feature_maps for i in range(batch_size)], dim=0)  # 4D

    print("Instance Normalization")
    print("input data:\n{} shape is {}".format(feature_maps_bs, feature_maps_bs.shape))

    instance_n = nn.InstanceNorm2d(num_features=num_features, momentum=momentum)

    for i in range(1):
        outputs = instance_n(feature_maps_bs)

        print(outputs)
        # print("\niter:{}, running_mean.shape: {}".format(i, bn.running_mean.shape))
        # print("iter:{}, running_var.shape: {}".format(i, bn.running_var.shape))
        # print("iter:{}, weight.shape: {}".format(i, bn.weight.shape))
        # print("iter:{}, bias.shape: {}".format(i, bn.bias.shape))


# ======================================== nn.grop norm
flag = 1
# flag = 0
if flag:

    batch_size = 2
    num_features = 4
    # 设置分组数时一定是能被整除的 通常设置为2的N次幂
    num_groups = 2   # 3 Expected number of channels in input to be divisible by num_groups

    features_shape = (2, 2)

    feature_map = torch.ones(features_shape)    # 2D
    feature_maps = torch.stack([feature_map * (i + 1) for i in range(num_features)], dim=0)  # 3D
    feature_maps_bs = torch.stack([feature_maps * (i + 1) for i in range(batch_size)], dim=0)  # 4D
    # 分组数 有几个特征图
    gn = nn.GroupNorm(num_groups, num_features)
    outputs = gn(feature_maps_bs)

    print("Group Normalization")
    print(gn.weight.shape)
    print(outputs[0])

Pytorch是机器学习中的一个重要框架,它与TensorFlow一起被认为是机器学习的两大框架。Pytorch学习可以从以下几个方面入手: 1. Pytorch基本语法:了解Pytorch的基本语法和操作,包括张量(Tensors)的创建、导入torch库、基本运算等\[2\]。 2. Pytorch中的autograd:了解autograd的概念和使用方法,它是Pytorch中用于自动计算梯度的工具,可以方便地进行反向传播\[2\]。 3. 使用Pytorch构建一个神经网络:学习使用torch.nn库构建神经网络的典型流程,包括定义网络结构、损失函数、反向传播和更新网络参数等\[2\]。 4. 使用Pytorch构建一个分类器:了解如何使用Pytorch构建一个分类器,包括任务和数据介绍、训练分类器的步骤以及在GPU上进行训练等\[2\]。 5. Pytorch的安装:可以通过pip命令安装Pytorch,具体命令为"pip install torch torchvision torchaudio",这样就可以在Python环境中使用Pytorch了\[3\]。 以上是一些关于Pytorch学习笔记,希望对你有帮助。如果你需要更详细的学习资料,可以参考引用\[1\]中提到的网上帖子,或者查阅Pytorch官方文档。 #### 引用[.reference_title] - *1* [pytorch自学笔记](https://blog.csdn.net/qq_41597915/article/details/123415393)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control_2,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* *3* [Pytorch学习笔记](https://blog.csdn.net/pizm123/article/details/126748381)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control_2,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值