BN层的输出Y与输入X之间的关系是:Y = (X - running_mean) / sqrt(running_var + eps) * gamma + beta,此不赘言。其中gamma、beta为可学习参数(在pytorch中分别改叫weight和bias),训练时通过反向传播更新;而running_mean、running_var则是在前向时先由X计算出mean和var,再由mean和var以动量momentum来更新running_mean和running_var。所以在训练阶段,running_mean和running_var在每次前向时更新一次;在测试阶段,则通过net.eval()固定该BN层的running_mean和running_var,此时这两个值即为训练阶段最后一次前向时确定的值,并在整个测试阶段保持不变。
这个参数的作用如下:
训练时用来统计训练时的forward过的min-batch数目,每经过一个min-batch, track_running_stats+=1
如果没有指定momentum, 则使用1/num_batches_tracked 作为因数来计算均值和方差(running mean and variance).
模型结构
UNet_Plain(
(down_conv1): DownBlock(
(double_conv): DoubleConv(
(double_conv): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
输出的参数(model.state_dict().keys())
['down_conv1.double_conv.double_conv.0.weight', 'down_conv1.double_conv.double_conv.0.bias', 'down_conv1.double_conv.double_conv.1.weight', 'down_conv1.double_conv.double_conv.1.bias', 'down_conv1.double_conv.double_conv.1.running_mean', 'down_conv1.double_conv.double_conv.1.running_var', 'down_conv1.double_conv.double_conv.1.num_batches_tracked',
其中1对应这模型结构中的BN,weight与bias 代表了gamma、beta