工作10年厌倦写代码
富纳纪念 (Funes the memorious)
Borges wrote Funes the memorious (originally “Funes el memorioso”) in 1954. This tale was born after the author suffered from insomnia. This tale tells the story of Ireneo Funes who suffers from hypermnesia. After a horseback riding accident, Funes discovers that he can remember absolutely everything: the shape of the clouds in every moment of the day, the position of the light in every corner of the house, what he did minute by minute two months ago, etc.
博尔赫斯(Borges)于1954年写了纪念Funes(原为“ Funes el memorioso”)。这个故事是在作者失眠之后诞生的。 这个故事讲述了患有高记忆症的Ireneo Funes的故事。 在一次骑马事故后,富内斯发现他几乎可以记住所有事情:一天中每时每刻的云朵形状,房屋各个角落的光线位置,两个月前他每分钟的行为等等。 。
In this tale, Borges explores various topics regarding several aspects of our life that require the “art of forgetting”. By remembering absolutely everything Funes loses one of the most important features of the thinking process: generalization. Funes cannot understand how the term “dog” can group every dog if they are clearly different. He can easily differentiate the small black dog with shiny eyes from the small black dog with that red dot in the left eye, but he cannot understand what makes a dog to be a dog.
在这个故事中,博尔赫斯(Borges)探索了涉及我们生活各个方面的各种主题,这些主题需要“忘记的艺术”。 通过绝对地记住一切,富内斯失去了思维过程中最重要的特征之一: 泛化 。 如果它们明显不同,Funes无法理解“狗”一词如何将每只狗分组。 他可以轻松地将眼睛闪闪发光的小黑狗与左眼带有红点的小黑狗区分开,但是他不明白是什么使狗成为狗。
过度拟合或泛化丢失 (Overfitting or the lost of generalization)
Funes hypermnesia is more a misfortune than a gift. With no generalization is impossible to use abstract thinking. And without abstract thinking, Funes is closer to a machine rather than a human being. He goes into the opposite direction of what we expect to obtain with machine learning.
Funes失忆症不是礼物而是不幸。 没有概括,就不可能使用抽象思维。 而且,如果没有抽象思维,Funes更接近于机器,而不是人类。 他朝着我们期望通过机器学习获得相反的方向发展。
Overfitting is to machine learning what hypermnesia is to Funes. Overfitted models cannot distinguish between noisy observations and the underlying model. This is, they cannot generalize.
过度拟合是指机器学习过高记忆对于Funes是什么。 过度拟合的模型无法区分嘈杂的观察结果和基础模型。 这是,他们不能一概而论。
The figure below shows two binary classifiers (black and green lines). The overfitted classifier (green line) is very dependant on the training data and it is very likely to have a poor performance when new observations arrive.
下图显示了两个二进制分类器(黑线和绿线)。 过度拟合的分类器(绿线)非常依赖于训练数据,并且当新的观测值到达时,其性能很可能很差。
How do I know I have an overfitted model? When you observe a much better performance in your training set than in your testing set.
我怎么知道我有一个过拟合的模型? 当您在训练集中观察到比测试集中更好的表现时。
Then, how can I prevent overfitting?
那么,如何防止过度拟合?
- Consider large enough datasets. If your dataset is too small your model will simply learn by heart ignoring any general rules. 考虑足够大的数据集。 如果您的数据集太小,您的模型将完全无视任何一般规则,而全心全意地学习。
Cross validation always in mind.
始终牢记交叉验证 。
Regularization always helps.
正则化总是有帮助的。
Ensembles of models can help with generalization.
模型集合可以帮助进行概括。
Early stopping. Iterative algorithms (CNN, DNN, RNN, etc.) suffer from the local minima problem. Stopping on time can give you better results.
早停。 迭代算法(CNN,DNN,RNN等)遭受局部极小问题的困扰 。 按时停止可以为您带来更好的结果。
Hopefully, you will consider reading Funes the memorious or any Borges tale. And hopefully, you will think about Funes when you find your next overfitted model.
希望您会考虑阅读Funes的纪念性故事或任何Borges故事。 希望您会在找到下一个过度拟合的模型时考虑Funes。
Originally published at: https://jmtirado.net/what-can-borges-teach-you-about-overfitting/
最初发布于: https : //jmtirado.net/what-can-borges-teach-you-about-overfitting/
翻译自: https://towardsdatascience.com/what-can-borges-teach-you-about-overfitting-e5ac2dd21217
工作10年厌倦写代码