ML~BatchNormalization

最新推荐文章于 2023-01-13 06:59:18 发布

小卜妞~

最新推荐文章于 2023-01-13 06:59:18 发布

阅读量108

点赞数

分类专栏：机器学习之路

本文链接：https://blog.csdn.net/qq_33866063/article/details/115148342

版权

机器学习之路专栏收录该内容

37 篇文章 2 订阅

订阅专栏

机器学习训练模型，其实就是想要学习到数据的分布。我们通过训练数据训练模型，用测试数据测试模型，然而训练数据和测试数据的分布一般不同，因此用符合训练数据分布训练得到的模型，对于测试数据来说精度就不如训练数据，因此为了提高模型的泛化能力，用训练数据对所有数据进行归一化。另一方面，在SGD学习的过程中，每一格batch之间的分布若不同，网络需要不断适应当前batch的数据，降低了训练速度。

批标准化就是将隐层中的输出进行归一化的过程，可以提高训练速度。

在BN中，用mini-batch求取的均值和方差，因此存在噪音。

莫烦BN

引用知乎评论
Why does batch normalization work?

(1) We know that normalizing input features can speed up learning, one intuition is that doing same thing for hidden layers should also work.

(2)solve the problem of covariance shift

Suppose you have trained your cat-recognizing network use black cat, but evaluate on colored cats, you will see data distribution changing(called covariance shift). Even there exist a true boundary separate cat and non-cat, you can’t expect learn that boundary only with black cat. So you may need to retrain the network.

For a neural network, suppose input distribution is constant, so output distribution of a certain hidden layer should have been constant. But as the weights of that layer and previous layers changing in the training phase, the output distribution will change, this cause covariance shift from the perspective of layer after it. Just like cat-recognizing network, the following need to re-train. To recover this problem, we use batch normal to force a zero-mean and one-variance distribution. It allow layer after it to learn independently from previous layers, and more concentrate on its own task, and so as to speed up the training process.

(3)Batch normal as regularization(slightly)

In batch normal, mean and variance is computed on mini-batch, which consist not too much samples. So the mean and variance contains noise. Just like dropout, it adds some noise to hidden layer’s activation(dropout randomly multiply activation by 0 or 1).

This is an extra and slight effect, don’t rely on it as a regularizer.