Deep Learning：正则化（九）

最新推荐文章于 2022-06-15 15:24:56 发布

蚊子爱牛牛

最新推荐文章于 2022-06-15 15:24:56 发布

阅读量440

点赞数

分类专栏： deep-learning 文章标签：深度学习参数共享参数绑定正则化

本文链接：https://blog.csdn.net/XJY104165/article/details/78329924

版权

deep-learning 专栏收录该内容

21 篇文章 0 订阅

订阅专栏

Thus far, in this chapter, when we have discussed adding constraints or penalties to the parameters, we have always done so with respect to a fixed region or point. For example, L2 regularization (or weight decay) penalizes model parameters for deviating from the fixed value of zero.

However, sometimes we may need other ways to express our prior knowledge about suitable values of the model parameters.
Sometimes we might not know precisely what values the parameters should take but we know, from knowledge of the domain and model architecture, that there should be some dependencies between the model parameters.

A common type of dependency that we often want to express is that certain parameters should be close to one another. Consider the following scenario: we have two models performing the same classification task (with the same set of classes) but with somewhat different input distributions. Formally, we have model A with parameters $w^{(A)}$ and model B with parameters $w^{(B)}$ . The two models map the input to two different, but related outputs: $y^{(A)} = f(w^{(A)}, x)$ and $y^{(B)} = g(w^{(B)}, x)$ .
Let us imagine that the tasks are similar enough (perhaps with similar input and output distributions) that we believe the model parameters should be close to each other: $∀i, w_i ^{(A)}$ should be close to $w_i^{(B)}$ . We can leverage this information through regularization. Specifically, we can use a parameter norm penalty of the form:

Ω (ω (A), ω (B)) = | | ω (A) - ω (B) | | 22

$\Omega(\omega^{(A)},\omega^{(B)})=||\omega^{(A)}-\omega^{(B)}||_2^2$

While a parameter norm penalty is one way to regularize parameters to be close to one another, the more popular way is to use constraints: to force sets of parameters to be equal. This method of regularization is often referred to as parameter sharing, where we interpret the various models or model components as sharing a unique set of parameters.
A significant advantage of parameter sharing over regularizing the parameters to be close (via a norm penalty) is that only a subset of the parameters (the unique set) need to be stored in memory. In certain models—such as the convolutional neural network—this can lead to significant reduction in the memory footprint of the model.
Convolutional Neural Networks By far the most popular and extensive use of parameter sharing occurs in convolutional neural networks (CNNs) applied to computer vision. Natural images have many statistical properties that are invariant to translation.
For example, a photo of a cat remains a photo of a cat if it is translated one pixel to the right. CNNs take this property into account by sharing parameters across multiple image locations. The same feature (a hidden unit with the same weights) is computed over different locations in the input.
Parameter sharing has allowed CNNs to dramatically lower the number of unique model parameters and to significantly increase network sizes without requiring a corresponding increase in training data. It remains one of the best examples of how to effectively incorporate domain knowledge into the network architecture.