参考文章
https://blog.csdn.net/weixin_39228381/article/details/107896863
https://blog.csdn.net/weixin_39228381/article/details/107939602
-
BatchNorm是在batch方向(每个batch的列方向)进行归一化:
import torch.nn as nn import torch if __name__ == '__main__': norm = nn.BatchNorm1d(4, affine=False) inputs = torch.FloatTensor([[1,2,3,4], [5,6,7,8]]) print(inputs) output = norm(inputs) print(output) ''' tensor([[-1.0000, -1.0000, -1.0000, -1.0000], [ 1.0000, 1.0000, 1.0000, 1.0000]]) '''
-
LayerNorm是在每个batct的行方向上进行归一化:
import torch.nn as nn import torch if __name__ == '__main__': norm = nn.LayerNorm(4) inputs = torch.FloatTensor([[1,2,3,4], [5,6,7,8]]) output = norm(inputs) print(output) ''' tensor([[-1.3416, -0.4472, 0.4472, 1.3416], [-1.3416, -0.4472, 0.4472, 1.3416]], grad_fn=<NativeLayerNormBackward>) '''
- 详细计算过程可参考上方链接,在计算过程中,需要注意样本方差的无偏估计和有偏估计
有偏和无偏的区别在于无偏的分母是N-1,有偏的分母是N。
- 详细计算过程可参考上方链接,在计算过程中,需要注意样本方差的无偏估计和有偏估计