Deep Learning:正则化(五)

Noise Robustness

Dataset Augmentation has motivated the use of noise applied to the inputs as a dataset augmentation strategy. For some models, the addition of noise with infinitesimal variance at the input of the model is equivalent to imposing a penalty on the norm of the weights.

  • In the general case, it is important to remember that noise injection can be much more powerful than simply shrinking the parameters, especially when the noise is added to the hidden units.
  • Another way that noise has been used in the service of regularizing models is by adding it to the weights. This technique has been used primarily in the context of recurrent neural networks (Jim et al., 1996; Graves, 2011). This can be interpreted as a stochastic implementation of a Bayesian inference over the weights. The Bayesian treatment of learning would consider the model weights to be uncertain and representable via a probability distribution that reflects this
    uncertainty.
  • This can also be interpreted as equivalent (under some assumptions) to a more traditional form of regularization. Adding noise to the weights has been shown to be an effective regularization strategy in the context of recurrent neural networks.

We study the regression setting, where we wish to train a function y^(x) that maps a set of features x to a scalar using the least-squares cost function between the model predictions y^(x) and the true values y:

J=Ep(x,y)[(y^(x)y)2]

The training set consists of m labeled examples {(x(1),y(1),...,(x(m),y(m)))} .

We now assume that with each input presentation we also include a random perturbation ϵWN(ϵ;0,ηI) of the network weights. Let us imagine that we have a standard l-layer MLP. We denote the perturbed model as y^ϵW(x) . Despite
the injection of noise, we are still interested in minimizing the squared error of the output of the network. The objective function thus becomes:

J~W=Ep(x,y,ϵW)[(y^ϵW(x)y)2]=Ep(x,y,ϵW)[y^2ϵW(x)2yy^ϵW(x)+y2]

For small η, the minimization of J with added weight noise (with covariance ηI) is equivalent to minimization of J with an additional regularization term:
ηEp(x,y)[||Wy^(x)||2]

This form of regularization encourages the parameters to go to regions of parameter space where small perturbations of the weights have a relatively small influence on the output.
In other words, it pushes the model into regions where the model is relatively insensitive to small variations in the weights, finding points that are not merely minima, but minima surrounded by flat regions.

Injecting Noise at the Output Targets

Most datasets have some amount of mistakes in the y labels. It can be harmful to maximize log p(y | x) when y is a mistake. One way to prevent this is to explicitly model the noise on the labels.

  • For example, we can assume that for some small constant ϵ , the training set label y is correct with probability 1ϵ , and otherwise any of the other possible labels might be correct.
  • This assumption is easy to incorporate into the cost function analytically, rather than by explicitly drawing noise samples.
  • The standard cross-entropy loss may then be used with these soft targets. Maximum likelihood learning with a softmax classifier and hard targets may actually never converge—the softmax can never predict a probability of exactly 0 or exactly 1, so it will continue to learn larger and larger weights, making more extreme predictions forever.
  • It is possible to prevent this scenario using other regularization strategies like weight decay.
  • Label smoothing has the advantage of preventing the pursuit of hard probabilities without discouraging correct classification.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值