一、 拟归一化
示例一: 当输入数据形状是[N,K][N, K][N,K]时,一般对应全连接层的输出,示例代码如下所示。
这种情况下会分别对K的每一个分量计算N个样本的均值和方差,数据和参数对应如下:
- 输入 x, [N, K]
- 输出 y, [N, K]
- 均值 μB\mu_BμB,[K, ]
- 方差 σB2\sigma_B^2σB2, [K, ]
- 缩放参数γ\gammaγ, [K, ]
- 平移参数β\betaβ, [K, ]
代码
import numpy as np
import paddle
from paddle.nn import BatchNorm1D
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]).astype('float32')
bn = BatchNorm1D(num_features=3)
x = paddle.to_tensor(data)
y = bn(x)
print('output of BatchNorm1D Layer: \n {}'.format(y.numpy()))
a = np.array([1, 4, 7])
a_mean = a.mean()
a_std = a.std()
b = (a - a_mean) / a_std
print('std {}, mean {}, \n output {}'.format(a_mean, a_std, b))
示例二: 当输入数据形状是[N,C,H,W][N, C, H, W][N,C,H,W]时, 一般对应卷积层的输出,示例代码如下所示。这种情况下会沿着C这一维度进行展开,分别对每一个通道计算N个样本中总共N×H×WN\times H \times WN×H×W个像素点的均值和方差,数据和参数对应如下:
- 输入 x, [N, C, H, W]
- 输出 y, [N, C, H, W]
- 均值 μB\mu_BμB,[C, ]
- 方差 σB2\sigma_B^2σB2, [C, ]
- 缩放参数γ\gammaγ, [C, ]
- 平移参数β\betaβ, [C, ]
import numpy as np
import paddle
from paddle.nn import BatchNorm2D
np.random.seed(100)
data = np.random.rand(2, 3, 3, 3).astype('float32')
bn = BatchNorm2D(num_features=3)
x = paddle.to_tensor(data)
y = bn(x)
print('input of BatchNorm2D Layer: \n {}'.format(x.numpy()))
print('output of BatchNorm2D Layer: \n {}'.format(y.numpy()))
a = data[:, 0, :, :]
a_mean = a.mean()
a_std = a.std()
b = (a - a_mean) / a_std
print('channel 0 of input data: \n {}'.format(a))
print('std {}, mean {}, \n output: \n {}'.format(a_mean, a_std, b))
二、 丢弃法
在飞桨Dropout API中,通过mode参数来指定用哪种方式对神经元进行操作
paddle.nn.Dropout(p=0.5, axis=None, mode="upscale_in_train”, name=None)
import paddle
import numpy as np
np.random.seed(100)
data1 = np.random.rand(2, 3, 3, 3).astype('float32')
data2 = np.arange(1, 13).reshape([-1, 3]).astype('float32')
x1 = paddle.to_tensor(data1)
drop11 = paddle.nn.Dropout(p=0.5, mode='downscale_in_infer')
droped_train11 = drop11(x1)
drop11.eval()
droped_eval11 = drop11(x1)
drop12 = paddle.nn.Dropout(p=0.5, mode='upscale_in_train')
droped_train12 = drop12(x1)
drop12.eval()
droped_eval12 = drop12(x1)
x2 = paddle.to_tensor(data2)
drop21 = paddle.nn.Dropout(p=0.5, mode='downscale_in_infer')
droped_train21 = drop21(x2)
drop21.eval()
droped_eval21 = drop21(x2)
drop22 = paddle.nn.Dropout(p=0.5, mode='upscale_in_train')
droped_train22 = drop22(x2)
drop22.eval()
droped_eval22 = drop22(x2)
print('x1 {}, \n droped_train11 \n {}, \n droped_eval11 \n {}'.format(data1, droped_train11.numpy(), droped_eval11.numpy()))
print('x1 {}, \n droped_train12 \n {}, \n droped_eval12 \n {}'.format(data1, droped_train12.numpy(), droped_eval12.numpy()))
print('x2 {}, \n droped_train21 \n {}, \n droped_eval21 \n {}'.format(data2, droped_train21.numpy(), droped_eval21.numpy()))
print('x2 {}, \n droped_train22 \n {}, \n droped_eval22 \n {}'.format(data2, droped_train22.numpy(), droped_eval22.numpy()))
从上述代码的输出可以发现,经过dropout之后,tensor中的某些元素变为了0,这个就是dropout实现的功能,通过随机将输入数据的元素置0,消除减弱了神经元节点间的联合适应性,增强模型的泛化能力。