Dropout: A Simple Way to Prevent Neural Networks from Overfitting Dropout：防止神经网络过度拟合的简单方法

最新推荐文章于 2024-12-04 20:03:51 发布

qq_43561737

最新推荐文章于 2024-12-04 20:03:51 发布

阅读量656

点赞数 11

文章标签：神经网络深度学习人工智能

本文链接：https://blog.csdn.net/qq_43561737/article/details/141152651

版权

一.摘要 Abstract

Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different “thinned” networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

Keywords: neural networks, regularization, model combination, deep learning

翻译：

具有大量参数的深度神经网络是非常强大的机器学习系统。然而，过度拟合是此类网络中的一个严重问题。大型网络使用起来也很慢，因此很难通过在测试时结合许多不同大型神经网络的预测来处理过度拟合。 Dropout 是解决这个问题的一种技术。关键思想是在训练期间从神经网络中随机删除单元（及其连接）。这可以防止单位过度地共同适应。在训练期间，从指数数量的不同“稀疏”网络中丢弃样本。在测试时，通过简单地使用具有较小权重的单个未细化网络，很容易近似平均所有这些细化网络的预测的效果。这显着减少了过度拟合，并比其他正则化方法有了重大改进。我们证明，dropout 提高了神经网络在视觉、语音识别、文档分类和计算生物学等监督学习任务上的性能，在许多基准数据集上获得了最先进的结果。

关键词：神经网络，正则化，模型组合，深度学习

二.简介

深度神经网络包含多个非线性隐藏层，这使得它们非常具有表现力的模型，可以学习输入和输出之间非常复杂的关系。然而，在训练数据有限的情况下，许多复杂的关系将是采样噪声的结果，因此它们将存在于训练集中，但不存在于真实的测试数据中，即使它是从相同的分布中得出的。这会导致过度拟合，并且已经开发了许多方法来减少过度拟合。其中包括一旦验证集的性能开始变差就停止训练，引入各种权重惩罚，例如 L1 和 L2 正则化和软权重共享。

对于大型神经网络，对许多单独训练的网络的输出进行平均的明显想法是非常昂贵的。当各个模型彼此不同时，组合多个模型是最有帮助的，并且为了使神经网络模型不同，它们应该具有不同的架构或针对不同的数据进行训练。训练许多不同的架构很困难，因为为每个架构找到最佳超参数是一项艰巨的任务，并且训练每个大型网络需要大量计算。此外，大型网络通常需要大量的训练数据，并且可能没有足够的数据可用于在不同的数据子集上训练不同的网络。即使能够训练许多不同的大型网络，在快速响应非常重要的应用中，在测试时使用所有这些网络也是不可行的。

Dropout 是解决这两个问题的技术。它可以防止过度拟合，并提供一种有效地近似组合指数级许多不同神经网络架构的方法。术语“dropout”是指丢弃神经网络中的单元（隐藏的和可见的）。通过删除一个单元，我们的意思是暂时将其从网络中删除，以及它的所有传入和传出连接。