深度学习泛化_了解深度学习需要重新考虑泛化—读后

深度学习泛化

The paper, “Understanding Deep Learning Requires Rethinking Generalization” is aimed at making you realize that whatever you think as the “cause” of generalization in deep neural network , is not the whole picture. The paper employs really simple experiments to dismantle the classical theory regarding generalization.

这篇论文《理解深度学习需要重新思考泛化》旨在让您认识到,您认为深度神经网络中泛化的“原因”并不是全部。 本文采用了非常简单的实验来拆除关于泛化的经典理论。

The first experiment shatters(pun intended) the views of those who attribute the generalization in DNN to the complexity of the hypothesis space/function class represented by the model. It proceeds as follows. Instead of using real labels of the data, you corrupt the labels, i.e. , assign random labels instead of actual ones and then train the model on this data. The results surprise these people. The model fits the data accurately(100% train accuracy). This means that the hypothesis space has the ability to model all functions on the finite dataset , and hence has really high complexity.

第一个实验粉碎了那些认为DNN中的泛化归因于模型所代表的假设空间/函数类的复杂性的人的观点。 其过程如下。 您可以破坏标签,而不是使用数据的真实标签,即分配随机标签而不是实际标签,然后在该数据上训练模型。 结果使这些人感到惊讶。 该模型准确地拟合了数据(火车精度为100%)。 这意味着假设空间具有对有限数据集上的所有函数进行建模的能力,因此具有很高的复杂性。

So, it is wrong to say, that “generalization is happening because the hypothesis space is restricted” , because it is not restricted.

所以,这是错误的说法,即“概括正在发生的事情,因为假设空间受到限制”,因为它没有限制。

The same experiment is an eyesore for people who will attribute the generalization to the (uniform)stability of learning algorithm, too. Because in the above experiment neither the learning algorithm, nor the loss function has been changed. As compared to low generalization error for same learning algorithm used for original labels, this model trained with random labels, will give high generalization error when asked to predict the random labels for unseen data.

对于将归纳归因于学习算法的(统一)稳定性的人来说,同样的实验也令人感到不快。 因为在上述实验中,学习算法和损失函数均未更改。 与用于原始标签的相同学习算法的低泛化误差相比,使用随机标签训练的该模型在要求预测看不见的数据的随机标签时会产生高泛化误差。

Thus , generalization is not just a function of stability of learning algorithm.

因此,泛化不仅仅是学习算法稳定性的函数。

This also dismantles the opinions of those who consider inductive biases in the model as the one restricting the hypothesis space to “functions that generalize well” . CNNs seem to fit the randomly labelled(image input; class output) data in a way as good as simple DNNs. This means that the function class represented by CNNs is no more restrictive than NNs. But, also, on the other hand we see that CNNs generalize better than NNs when only some part of dataset is given random labels.

这也消除了那些将模型中的归纳偏差视为将假设空间限制为“泛化得很好的函数”的人的观点。 CNN似乎可以像简单DNN一样适合随机标记的(图像输入;类输出)数据。 这意味着CNN代表的功能类别没有NN更大的限制。 但是,另一方面,我们也看到,当仅对数据集的一部分提供随机标签时,CNN比NN的泛化性更好。

This suggests that inductive bias is somewhat influencing generalization.

这表明归纳偏差在某种程度上影响泛化。

Well, then what does actually generalization depend on, surely? Clearly, generalization depends on data. As just changing labels of the points, the generalization error jumps up!But in what way does generalization depend on data? To answer this question the paper slowly increases the amount of noise in the data(inputs rather than labels*), and observes the generalization error. The generalization error increases with increasing noise. This brings about an interesting observation. Our system (model + learning algorithm + data) works in such a way that the model first prefers to learn the features that generalize to real world, even if only traces of it are present, and only later does it try to memorize (noise, data) pairs.

好吧,那么,概括化实际上取决于什么呢? 显然,概括取决于数据。 仅仅改变点的标签,泛化误差就会增加!但是泛化以何种方式依赖于数据呢? 为了回答这个问题,论文慢慢增加了数据(输入而不是标签*)中的噪声量,并观察到泛化误差。 泛化误差随着噪声的增加而增加。 这带来了一个有趣的观察。 我们的系统(模型+学习算法+数据)以这样的方式工作:即使模型仅存在痕迹,模型首先更喜欢学习可推广到现实世界的特征,然后才尝试记住(噪声,数据)对。

That is, the system prefers learning over memorization.

也就是说,系统更喜欢学习而不是记忆。

Image for post

The next and final attack is of the opinions of those who attribute generalization to regularization in deep learning. Even when we turn off all explicit regularization(data augmentation, dropout, weight decay) and implicit ones(early stopping, batch normalization) we can still see that the model doesn’t degenerate to memorizing all the data points. It shows considerable performance, still. Approaching the issue from another direction, they show that turning regularization on doesn’t prevent the model from overfitting the randomly labelled data.

下一个也是最后一个攻击者是那些将归因于深度学习中的正则化的人的观点。 即使关闭所有显式正则化(数据增强,丢失,权重衰减)和隐式正则化(早期停止,批处理归一化),我们仍然可以看到模型并不会退化为存储所有数据点。 它仍然显示出可观的性能。 他们从另一个方向处理问题,他们发现打开正则化并不能防止模型过度拟合随机标记的数据。

This confirms that regularization can’t even restrict the hypothesis space enough to avoid fitting of noise. So, regularization is better seen as a hyperparameter that is useful in fine-tuning the model.

这证实了正则化甚至不能充分限制假设空间来避免拟合噪声。 因此,将正则化更好地视为可用于微调模型的超参数。

But , now one can argue that the implicit regularization that SGD brings to the table wasn’t turned off in the above experiment. And so all this good generalization behavior of DNNs is due to regularization that comes with SGD. The paper still shows that this view is inadequate. It proves that the regularization behavior of SGD biases the model parameters to have lower norms, but experiments show that lower norm doesn’t actually correlate with good generalization always.

但是,现在可以争论的是,在上述实验中并未关闭SGD带来的隐式正则化。 因此,DNN的所有良好通用行为是由于SGD附带的正则化。 该论文仍然表明这种观点是不够的。 它证明了SGD的正则化行为使模型参数偏向于具有较低的范数,但是实验表明,较低的范数实际上并不总是与良好的概括性相关。

And thus, there is something more to generalization than just the regularization brought with SGD.

因此,除了SGD带来的正则化外,泛化还有更多的东西。

The key take-away from the paper is that it is no use to model generalization behavior independent of the structure of data. The generalization behavior in deep learning is based on how the model, data and learning algorithm interact with each other rather than a single component only.

本文的主要结论是,它没有必要使用独立于数据结构的泛化行为建模。 深度学习中的泛化行为基于模型,数据和学习算法如何相互交互,而不是仅基于单个组件。

*similar results are obtained when labels, instead of inputs are corrupted slowly.

*当标签而不是输入被缓慢破坏时,将获得类似的结果。

Next You Should Read :- https://lilianweng.github.io/lil-log/2019/03/14/are-deep-neural-networks-dramatically-overfitted.html

接下来你应该阅读: -https : //lilianweng.github.io/lil-log/2019/03/14/are-deep-neural-networks-dramatically-overfitted.html

翻译自: https://medium.com/da-labs/understanding-deep-learning-requires-rethinking-generalization-an-after-read-91c12fda650a

深度学习泛化

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值